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Abstract 

Million-transistor processors are being manufactured today, and soon it will 
be possible to put several million transistors on one integrated circuit. While 
memory applications of this technology are clear, it is not obvious how best to 
use it for computation purposes. One possibility is the architecture of the 
Message-Driven Processor (MDP), which consists of a 32+4-bit CPU, memory, 
and a network interface together on one chip. MDPs can be connected di- 
rectly to each other to form a 65536-processor, message-passing, MIMD, par- 
allel computer, the J-Machine. The MDPs architecture is unusual in that it 
provides a very high processing power to memory ratio. 

Concurrent Smalltalk is the primary language used for programming the J- 
Machine. Concurrent Smalltalk is the the language of choice because it fits 
the J-Machine's fine-grain, message-passing model well. This thesis de- 
scribes Concurrent Smalltalk and its implementation on the J-Machine, in- 
cluding the Optimist II compiler and Cosmos operating system. Optimist II 
can perform global optimization of programs, including inline function expan- 
sion, type inference, and global evaluation of constant expressions. Next, 
Cosmos and the Concurrent Smalltalk runtime environment are described. 
Finally, some quantitative and qualitative results are presented. The grain 
size (the average amount of time a method executes before suspending) was 
found to be about 60 instructions, and the MDP was found to execute one in- 
struction every two or four cycles, depending on whether external DRAM is 
used. A number of qualitative issues are described, along with a few prelimi- 
nary results for addressing difficult problems such as controlling parallelism. 
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Title: Associate Professor of Computer Science and Engineering 
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Chapter 1. Introduction 



Goals 

This thesis describes the Concurrent Smalltalk language and its implementation on the Mes- 
sage-Driven Processor. Concurrent Smalltalk, also known as CST, is a concurrent version of 
the object-oriented programming language Smalltalk [20]. The implementation consists of a 
global, optimizing compiler and a streamlined operating system for the J-Machine. 

This thesis covers quite a broad scope of the implementation of Concurrent Smalltalk, includ- 
ing subjects ranging from issues in parallel programming in general and the design of Con- 
current Smalltalk itself to some of the fine points of the design and optimization of the MDP 
architecture. The goal of the thesis is to demonstrate a working implementation of Concur- 
rent Smalltalk on the Message-Driven processor. Although the implementation is not yet 
complete, it does provide hooks for all of the advertised functionality of Concurrent Smalltalk 
and is based on solid ground. Versions of the implementation are running on recently manu- 
factured MDP chips, and I hope that the programs described herein will survive and evolve 
for the next five years. 

Another goal of this thesis was to discover and, whenever possible, fix design flaws in the 
MDP architecture and language specification so as to make an implementation of Concurrent 
Smalltalk practical. Several errors in the MDP architecture and Concurrent Smalltalk were 
found, as well as numerous bugs in the simulation tools used to verify the hardware. 

The next section gives a brief overview of the J-Machine hardware and the Concurrent 
Smalltalk language. It is followed by an outline of the software bridging the gap between 
Concurrent Smalltalk and the MDP hardware — the Optimist II compiler and the Cosmos op- 
erating system. The relationship of this work to others' in fine grain concurrent computation 
is then described. 

Second Edition 

This work was originally a Master's thesis completed in May 1989. It has been updated for 
the state of Optimist II compiler, Cosmos operating system, and MDPSim 7.0 simulator as of 
the end of May 1991. The Optimist II compiler now produces better code, and several Cos- 
mos routines, especially the CFUT fault handler, have been sped up. Furthermore, Cosmos 
has been updated for a few minor architectural revisions. 

The compiler and operating system have been evolving rapidly in the past few months due to 
the recent availability of MDP chips. This document does not include these newest changes, 
which include support for hardware I/O, debugging aids, and workarounds for first-silicon 
chip bugs, as they have little effect on the ideas in this work. Other members of the Concur- 
rent VLSI Architecture group, including Scott Furman, Rich Lethin, Todd Dampier, Shaun 
Kaneshiro, John Keen, and Mike Noakes, are now working on CST applications and Cosmos 
enhancements such as floating-point arithmetic, queue overflow handling, and garbage 
collection. These will be published in separate documents as they are completed. 
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1.1. Hardware and Software Architecture 
The J-Machine 

Million-transistor processors being manufactured today, and soon it will be possible to put 
several million transistors on one integrated circuit. While memory applications of this 
technology are clear, it is not obvious how best to use it for computation purposes. One pos- 
sibility is the architecture of the Message-Driven Processor (MDP), which consists of a 32+4- 
bit 1 CPU, memory, and a network interface together on one chip. MDPs can be connected 
directly to each other to form a 65536-processor, message-passing, MIMD, parallel computer, 
the J-Machine [14]. The network is a three-dimensional mesh fast enough to provide com- 
munication between the farthest pair of processors on a 65536-processor J-Machine in a few 
microseconds— on an unloaded network an 8-word message can be transmitted from one 
corner of the J-Machine to the other in just 4 microseconds. The processors are optimized for 
sending and receiving messages; a processor can be working on a message even before the 
entire message has arrived. The MDP's architecture is unusual in that it provides a very 
high processing power to memory ratio. 

The Message-Driven Processor 

The MDP has a register-based architecture and operates on 32-bit data words with 4-bit tags. 
Tags are essential in efficiently supporting late binding for object-oriented languages such as 
Concurrent Smalltalk. In addition, tags are necessary for garbage collection and valuable for 
debugging programs. 

The MDP is message-based. In its normal mode of operation, the MDP listens on the net- 
work for messages. When it receives a message from the network, it stores the message in a 
FIFO input message queue and dispatches on the address given in the first word of the mes- 
sage. Messages are used for all communication tasks, including function and method calls, 
replies, object transfers, and other synchronization facilities. 

A detailed but slightly obsolete description of the MDP architecture is in [16]; a updated 
summary is presented in Appendix D. MDPSim [24] [25] is an instruction level simulator, 
assembler, and debugger used to run MDP assembly language programs and test the operat- 
ing system. 

Concurrent Smalltalk 

Concurrent Smalltalk is the primary language used to program the J-Machine. One of the 
main goals of designing Concurrent Smalltalk was to take advantage of the J-Machine's 
unique features. A new software architecture was needed that would efficiently support fine- 
grain, message-passing computation. Whereas some existing parallel computers have mes- 
sage routing times measured in milliseconds, the routing time for a message sent from one 
end of even a large J-Machine to another is on the order of several microseconds. Operating 
system overhead on processing and dispatching that message of more than a few microsec- 
onds is not acceptable. 

Concurrent Smalltalk introduces concurrency to standard Smalltalk by evaluating argu- 
ments to method calls in parallel as well as allowing the computation of the value of a vari- 
able to proceed in parallel with the other computations of a method until the variable's value 
is actually needed. Furthermore, Concurrent Smalltalk adds distributed objects to Smalltalk. 
A distributed object is an object that can process many methods at the same time without 
any serialization bottlenecks other than those required by the algorithm in use. Although 



'Each word consists of 32 bits of data and a 4-bit tag. 
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standard objects can also process several methods simultaneously, they can only dispatch on 
one method at a time 1 . 

Concurrent Smalltalk is an ideal language for programming the J-Machine because it is easy 
to parallelize and yields small, fine-grain methods as well as a considerable amount of flexi- 
bility in the system software implementation. The methods dealing with a particular class 
can travel to the data object as opposed to the data traveling to the code. Concurrent 
Smalltalk also provides excellent facilities for creating data abstractions; the Optimist II 
compiler amplifies this power by providing global optimizations so performance does not suf- 
fer because abstractions are used. 

Another advantage of Concurrent Smalltalk is that it is low-level enough to be useful in im- 
plementing parts of the J-Machine runtime system, while being at a level high enough that 
the programmer does not have to worry about the infamous problems of parallel process syn- 
chronization and deadlocks. In fact, once the data structures are defined properly, pro- 
gramming in Concurrent Smalltalk feels much like programming in a standard sequential 
language. 



1 This restriction is relaxed for immutable standard objects because they may be copied at the operating system's 
discretion. Nevertheless, a distributed object can be mutable and still have no synchronization bottlenecks. 
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1.2. Overview 

Foundations 

Some of the pieces comprising the Concurrent Smalltalk environment were available before 
this thesis was done. A primitive compiler was available [21], as were a description of the 
operating system kernel [38], several descriptions of the language [13] [21] [17], and an MDP 
assembly language simulator (MDPSim 5.2) [24]. Unfortunately, none of the pieces really fit 
together — the various versions of the language were inconsistent, the output of the compiler 
was incompatible with the untested operating system kernel, which itself was written for an 
obsolete version of the MDP architecture [23]. 

It became clear that it would be easier to design the language, the compiler, and the operat- 
ing system from scratch than to try to fit the existing pieces together. Nevertheless, the ex- 
isting code and ideas were useful as guides to which approaches would likely yield good re- 
sults and which techniques should be abandoned. I took advantage of this opportunity to 
extend Concurrent Smalltalk to support several programming styles and add functions, clo- 
sures, continuations, arrays, nested local variables, and inline classes to produce a language 
with a compact implementation yet powerful libraries. The new features did not complicate 
implementation; in fact, by providing a small set of fundamental primitives, the new features 
often simplified the implementation of existing functionality, a phenomenon noticed in the 
design of the Scheme language [31] [1]. 

The contributions of this thesis include: 

• A redesign of the Concurrent Smalltalk language. 

• Optimist II, a new Concurrent Smalltalk compiler and interpreter. 

• Cosmos (Concurrent Smalltalk Operating System), an operating system that supports 
Concurrent Smalltalk on the MDP. 

• Runtime libraries for Concurrent Smalltalk. 

• Modifications to MDPSim, the MDP assembler/simulator, to facilitate downloading pro- 
grams, simplify debugging, and collect performance measurements. 

• Modifications to the MDP architecture that make it more suitable for Concurrent 
Smalltalk. 

I am indebted to Scott Wills and Andrew Chien for helping with the redesign of the Concur- 
rent Smalltalk language, and Richard Lethin, John Keen, and Stuart Fiske for helping with 
the MDP architecture changes. Professor William Dally supervised the project. 

System Overview 

The Optimist II Compiler 

The Optimist II compiler continues in the tradition of the Optimist compiler by compiling 
Concurrent Smalltalk to assembly code that is as small as possible without sacrificing speed. 
In addition, Optimist II contains an interactive Concurrent Smalltalk interpreter that is 
useful for prototyping and debugging Concurrent Smalltalk programs at the source level. 
Optimist II is also a platform for experimenting with compiler optimizations. Global opti- 
mizations such as function inlining and the reduction of method calls to function calls were 
added and found to be highly successful. 
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The compiler itself is divided into several phases, which are described in more detail in 
Chapter 3. It produces an MDPSim command file which can be downloaded into MDPSim 
and run on a simulated J-Machine. 



Cosmos 

Cosmos is the operating system used on the Message-Driven Processor to support code out- 
put by Optimist II. Many of the ideas in Cosmos are borrowed from JOSS [38] written by 
Brian Totty— JOSS introduced the concept of a Birth/Residence Address Table (BRAT) and 
the protocol for migrating object between processors. Nevertheless, Cosmos's code bears lit- 
tle resemblance to JOSS. 
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Figure 1-1. Software Environment Organization 

A Concurrent Smalltalk program can be either compiled or interpreted by the Optimist II compiler. Interpretation is 
useful to debug Concurrent Smalltalk programs and interactively experiment with language features. When a 
Concurrent Smalltalk program is compiled, it is loaded into MDPSim, a J-Machine simulator, together with the 
Cosmos operating system. MDPSim will then run the program to obtain its results as well as program perfor- 
mance statistics. 

The main goals of Cosmos were to make a working operating system, make it as efficient as 
possible, and make it simple, all subject to the time constraints of a Master's thesis. Those 
three goals have been achieved to a large extent, in that the operating system does work, and 
simple programs have been run on it. Unfortunately, controlling a large parallel computer is 
a difficult task, and Cosmos still falls short in many ways which are described in Chapter 8. 
In particular, higher-level resource management and load balancing issues are yet to be ade- 
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quately addressed. Nevertheless, Cosmos is a good start and a platform for experimenting 
with the more difficult problems. 

Example 

A very simple example of the use of the system to compile and run a factorial program is 
listed below. Please refer to chapter 5 for a more detailed example of the transformations in 
the compiler and Appendices B and C for information about using the compiler and the oper- 
ating system. 

CST : (daf un fact (n) 
(if «= n 1) 
1 
(* n (factorial (- n 1))))) 

#<Cst -Lambda 5090060 FACT> 
CST: (fact 3) 

When interpreting: (FACT 3) 
Error: Unbound global FACTORIAL 

> Break: 

> Type Command-/ to continue, Command-, to abort. 
1 > Continuing. . .Fatal error: Can't apply #<Nil> 

> Break: 

> Type Command-/ to continue, Command-, to abort. 
1 > Continuing. . . 

CST : (daf un fact (n) 
(if (<= n 1) 
1 

(* n (fact (- n 1))))) 
#<Cst-Lambda 4920924 FACT> 
CST: (fact 4) 
#<Integer 24> 
CST: (compile fact "NewFact.mdp") 

Figure 1-2. Compiling Fact 

The user entered a factorial function, corrected an error in it, tested it on a sample input, and then compiled it into 
MDP assembly code in the NewFact.mdp file. The user's input is shown in bold. 

First the user starts the compiler and enters the compiler's interactive mode (see Appendix 
B) as shown in Figure 1-2. He enters the fact function and runs it only to find an error — fact's 
recursive call should be to fact, not factorial. The user corrects the error and then uses the 
compiler's interpreter to successfully compute the factorial of 4. 

Afterwards the user compiles fact to MDP assembly code, quits Optimist II, and launches 
MDPSim, where he loads the object file, and calls fact on 4 to get the correct answer— -24 
(Figure 1-3). The stats command can then be used to determine some running statistics, 
such as the frequencies of instructions executed, the amount of parallelism used, and the to- 
tal time taken to run the program. Starting from a cold start, fact takes 725 steps on a 2x2x1 
J-Machine to compute its answer. 

Implementation 

The Optimist II compiler is written in CLOS [27], the Common Lisp Object System. Except 
for the use of the LOOP iteration macro [7], Optimist II adheres to standard Common Lisp as 
specified in [35] and amended in [6] and in the amendments specified by the Common Lisp 
Cleanup Committee that were available at the time of this writing. The LOOP macro is itself 
written in standard Common Lisp, so Optimist should run on any machine with a faithful 
implementation of Common Lisp. A slightly modified version of the 12/7/88 version of Xe- 
rox's PCL was used to implement a subset of CLOS before Apple Common Lisp 2.0 became 
available. 
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MDPSim -x 2 -y 2 -maize 0x1000 :: Cosmos: Cosmos .m NewFact.mdp 

Message-Driven Processor Simulator 

Version 7 . Rev B 

Accompanies MDP Architecture Document 11B 

Written by Waldemar Horwat 

Architecture Updates by Brian Totty and Jerry Larivee 

UROPs for Bill Dally 

4 MDPs present. 

60. .3}MESSAGE £act4 

Message }MSG:msgApplyFunction | 5 

Message} {fFact} 

Message} 4 

Message }IONODE 

Message }0 

Message} END 

@0..3}injact fact4@3 

@0. . 3}resetstats 

@0. .3}run 

Tick 724 Received priority message: 

OBJ:$801BB004 u-1 f=0 off set=$006EC=Reply length=$0004 

INT:$0000FC00 - 64512 

INT:$00000000 = 

INT:$00000018 = 24 

@0. . 3} stats 

725 ticks executed. 

... More statistics ... 

Figure 1-3. Running Fact 

The user loaded the fact object code and typed a few magic incantations that invoked the fact function on the input 
4 (the third word in the injected message). The result 24 (the fourth word in the ejected message) was returned 
after 725 steps on a 4-node J-Machine. Most of the time was spent distributing the fact code throughout the J-Ma- 
chine; the second time it only takes 498 steps to compute the answer (some code is still being distributed), the 
third time takes 289 steps, and afterwards the execution time is about 265 steps. 

Optimist II was developed on a Macintosh using Apple Common Lisp 1.2.2 and 2.0 written by 
Coral Software Corp (now merged with Apple Computer, Inc.). It runs on a 5-megabyte Mac- 
intosh II, although 8 megabytes are recommended and at least 16 are needed to run Optimist 
II and MDPSim simultaneously. 

Cosmos is written in MDP assembly language [16]. MDPSim [24] [25] was used as an 
assembler and simulator for a small J-Machine. 

All of the software needed to compile and run Concurrent Smalltalk programs exists on both 
a Macintosh II platform and on Sun workstations. 

Results 

The primary result of this work is a demonstration of a working implementation of Concur- 
rent Smalltalk on a J-Machine. In addition, a number of secondary results were obtained. 
These include the qualitative and quantitative benefits of optimizations in the Optimist II 
compiler, data on the expected grain size (the number of instructions executed in response to 
a message), and a number of qualitative observations about the shortcomings of the current 
system. The results did not always come out as expected. For example, the finding that the 
grain size is about 60 instructions was surprising; it was expected to be much lower. Code 
statistics indicate that the MDP will take about 1.9 cycles per instruction, although most in- 
structions execute in 1 cycle; if slow external DRAM is used to hold user programs and data, 
the MDP could take as many as 3.5 cycles per instruction. Network loading calculations indi- 
cate that network congestion will become a concern when the size of the J-Machine exceeds 
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343 nodes; either a faster network or some means of exploiting locality will be needed for 
larger J-Machines. 

The quantitative results are listed in Chapter 7, while the qualitative ones are in Chapter 8. 
Chapter 8 may seem a little pessimistic, but many of the current shortcomings listed there 
would not have been found had this work not been done; furthermore, the current implemen- 
tation of Cosmos provides a great, highly accurate platform for research into the issues pre- 
sented there. 

Caveats 

Due to the availability of only a finite amount of time for writing this thesis, which could po- 
tentially involve an infinite amount of work, some features could not be included in the cur- 
rent implementation of Concurrent Smalltalk. The biggest omission is the lack of garbage 
collection — if enough storage isn't reclaimed, the machine will fail. Garbage collection, 
although interesting, was omitted to keep this project to a reasonable size — a good garbage 
collector and load manager would require more effort than is desirable for a Master's thesis. 

Full futures were also not implemented. They were omitted from the interpreter in the com- 
piler because simulating them is difficult on a sequential machine in a sequential language 
(Common Lisp). Futures were omitted from the run-time system because of the considerable 
amount of work needed to implement all the fault handlers and special cases involved. Ne- 
vertheless, almost all Concurrent Smalltalk programs still attain reasonable parallelism 
through the use of cfutures 1 , which are fully operational. 

Other features that were not implemented are I/O facilities at both the Optimist II and Cos- 
mos levels and runtime support for local (non-distributed) arrays and floating point numbers. 
I/O facilities, while useful, do not contribute much to the project and are easy to add later. 
Local arrays and floating point numbers are supported by the Optimist II compiler but not 
the runtime system; supporting them at the runtime level will require writing MDP assem- 
bly language; no major surprises are expected there. 

Some of the optional features of Concurrent Smalltalk were not included due to a lack of 
time. All class inline declarations are currently ignored; I anticipate that it will be possible 
to inline objects inside other objects sometime in the future, but that is not a high priority at 
this time. The omission of class inlining does not change the semantics of Concurrent 
Smalltalk programs. Function inlining is more useful, and it does work now. 

Reading Guide 

The remainder of this chapter describes related work in fine-grain concurrent computation. 
The succeeding chapters delve into various aspects of the system, starting from the top — 
Chapter 2, Concurrent Smalltalk, provides an introduction to the Concurrent Smalltalk 
language in general. Chapter 3, The Optimist II Compiler, describes the Concurrent 
Smalltalk compiler and interpreter. Chapter 4, The Cosmos Operating System, describes 
the operating system. To avoid overlap, the compiler features documented in [21] are not 
documented here; thus, it might be helpful to consult [21] when reading Chapter 3. 

Chapter 5, Sample Program, traces the progress of a sample program from the Concurrent 
Smalltalk source level down to object code. Chapter 6, Debugging, provides some debugging 
techniques for Concurrent Smalltalk and MDP programs. Chapters 7, Performance Mea- 
surements, and 8, Future Evolution, present the results of this work. Chapter 7 contains 
quantitative measurements of the performance of Cosmos and the compiled code, while 
Chapter 8 describes some of the less tangible, qualitative shortcomings of the current system 
and ideas for correcting them. Chapter 9, Conclusion, concludes the main body of the the- 
sis. 



l A cfuture, also called a context future, is a local future which cannot be passed outside the function without being 
touched (i.e. replaced by its value). 
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The appendices parallel the main chapters with more detailed information. Appendix A, 
Concurrent Smalltalk Reference, is the most important, for it contains the specification 
of Concurrent Smalltalk. Appendix B, Using Optimist II, provides a detailed description of 
the Optimist II features not listed in Appendix A. Similarly, Appendix C, Using Cosmos, is 
a guide to running Cosmos on MDPSim; the latest MDPSim reference manual [25] should 
also be consulted when running Cosmos. Appendix D, MDP Architecture Summary, 
summarizes the current version of the MDP architecture. Finally, Appendix F, Cosmos 
Listing, contains a listing of the entire operating system. 

Since this thesis also serves as a reference manual for Concurrent Smalltalk, Chapter 2 and 
Appendices A and B have been indexed. The index appears at the end of the thesis. 
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1.3. Related Work 

The ideas of optimizing Smalltalk and running object-oriented software on concurrent, fine- 
grain systems are not new, but they have not been integrated previously to the extent found 
on the J-Machine. While most of the efforts concentrated on either optimizing Smalltalk for 
conventional computers or developing radically new programming methodologies, Concurrent 
Smalltalk presents a somewhat conventional Smalltalk environment to the programmer 
(with a few new features such as futures and distributed objects), which is at the same time 
efficiently implemented on a fine-grain parallel computer. 

A major contribution of this work is the actual optimized implementation of Concurrent 
Smalltalk on an assembly language architecture. While theoretical studies and simulations 
in higher-level languages can yield asymptotic and qualitative results, an implementation 
yields the constant factors determining a system's performance. These performance mea- 
surements are an important part of this work, as they indicate the relative costs of the primi- 
tive operations and can be used to gauge the true performance of a concurrent computer. 

Smalltalk Systems 

Smalltalk-80 

Early Smalltalk-80 optimization efforts such as [18] concentrated on optimizing Smalltalk 
within the constraints of the byte code interpreter. In addition, the work was limited by the 
Smalltalk-80 constraints of making contexts and methods program-visible data structures, 
which required some effort to convert between the optimized and standardized versions of 
the structures. Several context optimizations are also presented in [18], including determin- 
ing which contexts which can be referred to as first-class data objects and which contexts can 
be pointed by blocks. Most contexts do not fall into either category, and they can be placed 
on the stack. Such optimizations are now also commonly done in Lisp compilers [36]. 

Whereas early Smalltalk-80 implementations were constrained to compatibility with byte 
codes and were run on stack machines, Concurrent Smalltalk is bound by neither constraint. 
The formats of contexts and method code are not defined in the language, and there are no 
portable means to store a pointer to a context in a programmer-visible variable. Thus, Opti- 
mist II and Cosmos can use the most efficient format for a context or even several different 
formats if they so desire. Furthermore, the MDP is not a stack-based machine, so there are 
no clear advantages to determining which contexts will be live for a long time. Also, contexts 
are fully self-contained, so a closure cannot refer to a context. Finally, several techniques are 
used to optimize closures. As will be seen in Chapter 3, when a closure is created, either the 
lexical variables are copied into the closure, or a common object is made to which both the 
context and the closure refer. 

Optimized Sequential Smalltalk 

A few years later it became clear that global analysis and optimization were necessary to op- 
timize Smalltalk programs further. Optimizing Smalltalk well required an ability to convert 
method dispatches into more efficient function calls, which led rise to several type systems 
for Smalltalk [5] [26]. When a type system could be applied to a Smalltalk program, the 
compiler could optimize it by a factor of 5 to 10 over interpreted Smalltalk. The main com- 
piler optimizations of TS [26] are similar to those of Optimist II: Both TS and Optimist II 
can convert a message send into a case statement of procedure calls, substitute functions in- 
line, and optimize tail recursion. In addition, TS can beta-reduce blocks, which Optimist II 
currently cannot do. On the other hand, Optimist II contains a number of other powerful 
dataflow optimizations (see Chapter 3 and [21]) commonly found in C compilers, which make 
its assembly language output close to optimal. Moreover, Optimist II can evaluate large con- 
stant expressions at compile time, and it can infer types of variables, allowing it to produce 
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good code even though type declarations in Concurrent Smalltalk are completely optional. 
TS, on the other hand, has difficulties combining typed code with untyped code. 

The MDP hardware also plays an important role in making Optimist II efficient. By provid- 
ing tags and checking them on primitive operations, the MDP architecture frees Optimist II 
from the difficult and often unrewarding process of analyzing programs trying to determine 
information such as whether an integer variable could contain a large-integer (an integer 
which does not fit into a single 32-bit word) or whether the arguments to + are known to be 
numbers. Although this information is generally difficult to determine, in most cases 
integers are small and the arguments to arithmetic primitives are usually numbers, so 
hardware tag-checking is the right approach to this problem. Thanks to the MDP hardware, 
even if Optimist II cannot determine the type of some expression, performance does not 
suffer too much. 

ConcurrentSmalltalk 

A recent language close to Concurrent Smalltalk and having an almost identical name is 
CONCURRENTSMALLTALK [39] [40] independently developed by Yasuhiko Yokote and Mario 
Tokoro. CONCURRENTSMALLTALK shares with Concurrent Smalltalk the cfuture facility 
(called a CBox in CONCURRENTSMALLTALK) and the ability to process messages asyn- 
chronously. In addition, CONCURRENTSMALLTALK defines atomic objects, which Concur- 
rent Smalltalk does not have but can easily emulate using locks. On the other hand, Concur- 
rent Smalltalk includes distributed objects, which CONCURRENTSMALLTALK does not pro- 
vide. Furthermore, the implementation of Concurrent Smalltalk is more optimized. 
Whereas CONCURRENTSMALLTALK is implemented as a byte code interpreter, Concurrent 
Smalltalk compiles to assembly language. 

The two languages have somewhat different flavors. CONCURRENTSMALLTALK is very close 
to Smalltalk-80, and most of the concurrent features are add-ons that have to be explicitly 
requested by the programmer. Concurrent Smalltalk makes concurrency the default, and 
the programmer has to explicitly request sequential processing if he wants it. At the same 
time, the MDP hardware assists Concurrent Smalltalk by making the use of concurrency 
very cheap. For example, a hardware tag is provided that implements cfutures in Concur- 
rent Smalltalk using much less overhead than cboxes in CONCURRENTSMALLTALK 

In [40] several changes to the original CONCURRENTSMALLTALK are discussed. Blocks are 
treated differently depending on whether they were created by atomic objects' contexts or not. 
Concurrent Smalltalk's model of only having one kind of object and using locks where neces- 
sary to make atomic transactions does not lead to these difficulties. Finally, secretary objects 
were introduced to CONCURRENTSMALLTALK to keep track of which threads are waiting for 
a resource. An equivalent facility is used internally in locks in Concurrent Smalltalk. 

Actor Systems 

Another recent development in object oriented programming was the rise in actor systems 
[2]. An actor system is a programming paradigm in which simple self-contained entities 
called actors communicate with each other to run a program. Much of the program's content 
is held in the interconnections among the actors. From the implementation standpoint, Con- 
current Smalltalk shares many of the ideas with actor systems, but the language itself is not 
designed exclusively as an actor language. Instead, Concurrent Smalltalk is as a language 
closer to Smalltalk and Lisp, but it is possible to write actor-like programs in Concurrent 
Smalltalk without too much trouble. 

Cantor 

Cantor [4] is both a programming language and a formalism for reasoning about the prob- 
lems that arise in fine-grain, message-passing parallel computers. In Cantor each object (the 
Cantor equivalent of a Concurrent Smalltalk context) can only perform a bounded amount of 
computation on receiving a message, and that computation is atomic. Also, messages sent 
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from one object to another are guaranteed to arrive in the original order. Concurrent 
Smalltalk is similar to Cantor at the implementation level — when a message is sent to a con- 
text, it performs a bounded amount of computation 1 , perhaps sends a few more messages, 
and then either suspends or waits for the next message. The state of a computation is com- 
posed mostly of idle objects and messages traveling between objects, with only a few objects 
executing. Hence, at a superficial level, a Concurrent Smalltalk object code program is a 
Cantor program. Nevertheless, the Concurrent Smalltalk object code program is more com- 
plicated because it might fault while performing the computation of the next state. One can 
view this possibility as either computation being non-atomic or treating faults as if they were 
message sends and suspends, preserving the Cantor model. Another distinction is that Con- 
current Smalltalk does not guarantee that messages between a pair of objects will arrive in 
the order in which they were sent. 

Probably the best relationship between Concurrent Smalltalk and Cantor is that Concurrent 
Smalltalk is a high-level language that compiles to Cantor-like object code. At the source 
level, Concurrent Smalltalk frees the programmer from the myriad of error-prone synchro- 
nization details found in Cantor. Concurrent Smalltalk encapsulates the Cantor concept of 
future flow into a few easy-to-use primitives such as touch and nconcurrently. At the 
same time, Concurrent Smalltalk presents the appearance of global and nested data struc- 
tures (such as lexical scoping of local variables) which are compiled into interacting objects. 

Nevertheless, Cantor is a good theoretical model for computation on the J-Machine. For ex- 
ample, the load balancing and management results in [4] are expected to also apply to the J- 
Machine. However, the J-Machine can also suffer from problems not discusses in [4], such as 
having too much parallelism. Some of the load balancing issues are presented in Chapter 8. 

Acore 

Acore [30], an "actor core language," is another recent actor language. Like Cantor, it pro- 
vides an environment in which a computation is done by interacting actors with limited abili- 
ties; however, actors in Acore can compute arbitrary functions to determine state, and Acore 
has a notion of a transaction (a message send and a reply), which greatly simplifies pro- 
gramming. 

Acore and Concurrent Smalltalk are similar in many ways. Both languages implement mes- 
sage sends, replies, concurrent evaluation of subexpressions, local variables, static scoping, 
and instance objects (called actors in Acore). However, there are also a few differences. Due 
to its Smalltalk-80 heritage, Concurrent Smalltalk permits local variables to be altered, 
while Acore does not; both languages allow mutation of instance variables. In addition, 
Acore implements a sponsorship mechanism for higher-order control of the course of a com- 
putation and a complaint mechanism for handling exceptions. It remains to be seen whether 
these mechanisms will be necessary in Concurrent Smalltalk 2 . 

Acore is compiled into Pract, which is a form of an actor assembly language, whereas Con- 
current Smalltalk is compiled into MDP assembly language. As a result of this difference, 
some actions which are cheap in one language are expensive in the other, which affects the 
language design. Actor creation is very cheap in Acore, while instance object creation, mod- 
erately expensive in Concurrent Smalltalk, is avoided whenever possible. On the other hand, 
futures are fairly expensive in Acore, while they are very cheap in Concurrent Smalltalk; 
thus, Concurrent Smalltalk creates a future (or a cheaper cfuture) as a result of every non- 
primitive function call, achieving maximum concurrency within a method in most cases. 
Acore, on the other hand, often has to do a relatively expensive join operation. For the same 



1 As will be discussed in Chapters 5 and 10, the amount of computation done by a Concurrent Smalltalk process on 
receiving a message truly is bounded, but it is done for a more prosaic reason than keeping a clean model — user 
Concurrent Smalltalk methods are not allowed to loop without a message send somewhere to break the loop to pre- 
vent the incoming message queues on an MDP from overflowing if the loop lasts for a long time. Also, long, indivis- 
ible loops would degrate latency for other messages that arc waiting in an MDP's incoming message queue. 
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reason, futures are transparent in Concurrent Smalltalk, while they are programmer-visible 
in Acore 1 . 

The two languages use the same mechanism for calling messages. When a Concurrent 
Smalltalk process or an Acore actor makes a function or method call, it passes a continuation 
to which results should be sent. The continuation includes both a process and a slot within 
that process in which the result should be stored. 

J-Machine References 

[13] and [14] are good descriptions of the philosophy of the J-Machine project and the early 
Concurrent Smalltalk language; [15] is a recent status report on the MDP from the hardware 
perspective. [22] describes some of the experiences gained from designing the previous ver- 
sion of Concurrent Smalltalk and implementing the first-generation Optimist compiler. [10] 
contains a nontrivial program written in an older dialect of Concurrent Smalltalk. [8] and 
[9] describe Concurrent Aggregates, a higher-level language than Concurrent Smalltalk for 
programming the J-Machine. [33] and [34] describe a parallel project to implement dataflow 
on the J-Machine. Finally, [41] and [42] analyze the desirability of supporting the more 
common existing parallel programming paradigms on the J-Machine. 



2 A complaint mechanism could be built on top of Concurrent Smalltalk by using the multiple-value return feature — 
one of the values could denote a continuation to which exceptions should be routed. Acore uses a similar im- 
plementation to handle exceptions. 
Nevertheless, a language that hides futures could be built on top of Acore. 
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Introduction 

A Concurrent Smalltalk program is a sequence of top-level definitions. Figure 2-1 shows a 
sample program that calculates Fibonacci numbers using double recursion. 

(Def method fib Integer () 
(if (<- self 2) 
1 
(+ (fib (- self 1)) (fib (- self 2))))) 

Figure 2-1. A simple Fibonacci program 

This program calculates Fibonacci numbers using double recursion. Although it does not use the most efficient al- 
gorithm to calculate Fibonacci numbers, it does illustrate Concurrent Smalltalk's implicit concurrency. 

The program is a single method associated with the selector fib and class integer. The fact 
that the method takes no arguments other than the integer receiver is indicated by the empty 
list, (), on the first line. The following three lines contain the body of the method. Self rep- 
resents the receiver object, which is the number to which fib was applied. The if statement 
checks whether that number is less than or equal to 2. If so, fib returns 1. Otherwise, fib 
returns the sum of (fib (- self 1)) and (fib (- self 2) ), which are computed con- 
currently. This concurrent evaluation of arguments is one of the important differences be- 
tween Concurrent Smalltalk and sequential Smalltalk. 

Fib can be invoked by calling it on an integer (the receiver object): 

(fib 30) 
Fib would then calculate and return the answer 832040. If fib had any more arguments, 
they would be included after the receiver object, as in: 

(fib 30 x y z) 

Functions 

The Fibonacci program was defined as a method. It is also possible to define it as a function, 
as in Figure 2-2. A function is a method not associated with any class or selector. Although 
in this example methods and functions are equivalent, in other cases, such as in iterators, 
functions may be more useful than methods. 



(Defun 


ffib 


(n) 


(if 


(<= n 


2) 


1 






(+ 


(ffib 


(- 



n 1)> (ffib (- n 2))))) 

Figure 2-2. A simple Fibonacci program as a function 

Functions have no receiver object, so the parameter n has to be specified explicitly. 

The syntax for a method and a function call is the same, so ffib would also be called by: 

(ffib 30) 
The meaning of applying ffib to arguments (30 in this case) depends on whether ffib is a 
selector or a function. If ffib were a selector, a method lookup would be done to determine 
the class of the first argument and then call the method corresponding to the selector and 
that class, while if ffib is a function, it is called directly. 
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Extracting Methods 

A manual method lookup can be done using the method primitive. Method takes two pa- 
rameters, a selector and a class, and returns a function which performs the same action as 
the method. For example, the method shown in Figure 2-1 can be extracted using 

(method fib integer) 

The result behaves just like the f f ib function in Figure 2-2. It can be called using 

( (method fib integer) 30) 

A method extracted in this way does not have to be a direct method of the class; it can be an 
inherited method. 

Classes 

A Concurrent Smalltalk class is a type; the two words are used interchangeably in the lan- 
guage definition 1 . A few built-in classes are predefined; these include symbols, booleans, in- 
tegers, floating point numbers, characters, functions, and other classes. A complete list is 
given in table A-2. All classes are subclasses of the class object. 

The def class primitive can be used to add user-defined classes. A class definition consists 
of a list of superclasses and zero or more new instance variables. Each instance object of that 
class contains those instance variables. The user may also define a number of methods for 
that class. A simple class that implements Lisp-like lists is shown in Figure 2-3. 

(Def class pair (object) car cdr) 



(Def method car pair () car) 

(Def method cdr pair () cdr) 

(Def method get-car pair () car) 

(Def method get-cdr pair () cdr) 

(Defmethod put-car pair (value) :pair (set car value) self) 

(Defmethod put-cdr pair (value) :pair (set cdr value) self) 

(Defun cons (first second) :pair 

(put-car-cdr (new pair) first second) ) 

(Defmethod put-car-cdr pair (first second) :pair 
(cset car first) 
(cset cdr second) 
self) 

Figure 2-3. The pair class 

The six methods that are commented out by semicolons are defined automatically by def class (in addition to a 
few others described in Section A.4). Car and get-car do the same thing; both are defined because car is 
more convenient, but it cannot be used in the body of a method of class pai r because static scoping shadows the 
method car by the instance variable car. 

The :pair constructs define the result types of the methods. They are unnecessary, but they do improve effi- 
ciency and allow rudimentary type checking. 

The class pair is defined on the first line of Figure 2-3. The def class primitive specifies 
the class name (pair) , the superclasses ( (object) ), and the instance variables (car and 
cdr). 

Whenever a class c is defined, a class predicate and reader and writer methods are defined 
automatically, as well other, less-used methods described in Section A.4. The class predicate 
is a function named c? that accepts one argument a and returns true if a is a member of 
class C (or one of its subclasses) and false otherwise. Also, for each instance variable x of C, 



Nonetheless, the words type and class have slightly different meanings in the discussion of the compiler in Chapter 
3. 
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the methods x, get-x, and put-x are defined. The first two methods take an instance object 
O as an argument and return the value of x in o, while put-x takes two arguments, an in- 
stance object o and a new value v of x, and assigns v to x in o. The methods x and get-x are 
known as reader methods, while put-x is called a writer method. The writer methods return 
o, the object to which the value is written. 

After a class is defined, additional methods may be defined for it. In the above example, a 
method put-car-cdr is defined for the class pair. Put-car-cdr sets the value of a pair's 
car and cdr variables and returns the pair. Inside a method, the receiver's instance vari- 
ables can be accessed by their names. 

Overriding Methods 

Consider a class c-2 which is a subclass of c1 . When a class c2 defines a method m2 with the 
same selector s as a method ml of d , the class c2 is said to be overriding the method ml. 
When selector s is applied to an object of class c2 or one of its descendants, method m2 will be 
used instead of ml . 

Nevertheless, sometimes it is desirable to call ml on an object of class C2. For example, 
method m2 might want to call the method it is overriding. An overridden method ml can be 
called by performing a manual method lookup using the form (method S d ) . The resulting 
method can be called normally. 

Type Restriction 

The type of an overriding method must be a subtype of the type of the overridden method. 
For instance, in the above example the type of m2 must be a subtype of the type of ml . This 
means that both methods must have the same number of arguments, the types of the argu- 
ments of the overriding method must be supertypes (superclasses) of the types of the argu- 
ments of the overridden method, and the result type of the overriding method must be a sub- 
type (subclass) of the result type of the overridden method. If any argument of the overrid- 
den method is declared inline or using any other declaration, either explicitly or by default, 
the corresponding argument of the overriding method must have the same type and declara- 
tions. The results of violating the above rules are undefined. The compiler may issue errors 
if the above rule is violated, but it is not guaranteed to do so. 

The above restrictions apply only to methods being overridden. There are no restrictions on 
methods with the same name declared for disjoint classes (i.e. classes which are not sub- 
classes of each other). 

The Class Object 

Methods of class object are very similar to functions. There are two main differences be- 
tween functions and methods of class object: 

• A method of class object can be overridden by a method of a more specific class. For ex- 
ample, if cons in Figure 2-3 is defined as a function, no other function or method may be 
called cons. On the other hand, if it is defined as a method of class object, it may be over- 
ridden by a method cons defined for integers. However, a method may not be overridden by 
a function. 

• A function that takes no parameters can be defined, while a method must always take at 
least one parameter — the instance object. 

In the interest of code maintenance and readability, it is recommended that functions be used 
in cases when overriding makes no sense; parameter functions to iterators fall into this cate- 
gory. On the other hand, if overriding a function might be desirable, that function should be 
defined as a method of type object. It is not clear whether overriding cons (Figure 2-3) 
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would be useful, so it might be defined either as a function or a method, depending on one's 
taste. 

Local Variables 

A method or a function can declare local variables using the clet or let statements or their 
derivatives. For example, the function fib from Figure 2-1 could be rewritten using two lo- 
cal variables as in Figure 2-4. 

(Defmethod Integer lfib () 
(if (<= self 2) 
1 
(clet 

((a (lfib (- self 1))) 

(b (lfib (- self 2)))) 
(+ a b))) 

Figure 2-4. Fibonacci program with local variables 

The above program is equivalent to the one in Figure 2-1 and actually compiles into the same code. 

Local variables declared with a clet or a let statement have a scope which is the body of 
the clet or let statement (except for the bindings themselves). CLet and let statements 
can be nested. Local variables can be altered using a cset or a set statement; the difference 
between the two will be explained in the Concurrency section below. 

Types 

The types (i.e. classes) of various values can be declared explicitly. Such declarations serve 
three purposes: 

• Types allow the compiler to generate faster code by allowing it to perform operations 
such as method lookup at compile time. 

• The compiler can perform type checking to find simple errors such as passing a value of 
one type to a function that is expecting a value of a different type. 

• Declaring types of function parameters and results serves to document the code. 

For the purposes of type inclusion, a type is its own supertype and subtype. 

Due to the common use of generic types, the compiler's type checking is necessarily limited. 
In particular, when an expression of type t1 is assigned to a variable of type t2 or passed as a 
parameter to a function that expects type t2, the compiler usually will give an error or a 
warning if t1 is not t2, t1 is not a superclass of t2, and t2 is not a superclass of t1 . This does 
not mean, however, that the semantics of function parameter and return type declarations 
are any different from their standard interpretations — when a function parameter is declared 
type t, every value passed as that parameter must be a member of type t, and when a func- 
tion result is declared type t, the function must return a value that is a member of type t as 
that result — the only difficulty is that the compiler is not able to do full type checking, so it 
usually follows the rules outlined above. 

For example, integer and boolean are both subclasses of the object and magnitude 
classes (see Figure A-2), but they are otherwise unrelated to each other. Thus an integer 
can be passed to a function that expects an object, an object can be passed to a function 
that expects an integer, but a boolean cannot be passed to a function that expects an in- 
teger. The second possibility, passing a more general type to a function that expects a less 
general one, is included to handle the common case of extracting values from general storage 
class. One could, for example, keep a pair of integers and desire to add the pair's car and 
cdr together. Since a pair is a generic data structure, it can contain values of type object; 
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a compiler has no simple way of knowing at compile time that the pair will contain inte- 
gers, so the best it can deduce is that the pair's car and cdr are objects. 

Types can be declared as follows: 

• To specify the type of a local or an instance variable, follow the variable name with a 
colon and its type. Several locals can be declared using the same type by separating their 
names with commas. 

• To specify the type of a function or method formal, follow the formal name with a colon 
and its type. Several formals can be declared using the same type by separating their names 
with commas. 

• To specify the result type of a function or method, follow the list of formals with a colon 
and the result type 1 . 

• A type of an intermediate result can be specified using a type-assertion statement 2 . 

The three kinds of declarations are illustrated in Figure 2-5, yet another copy of the Fi- 
bonacci program. All untyped variables, parameters, and functions and methods are typed 
object by default. 

(Defun tfib (n: integer) : integer 
(if <<« n 2) 
1 
(clet 

((a:integer (tfib (- self 1))) 

(btinteger (tfib (- self 2)))) 
(+ a b))) 

Figure 2-5. Fibonacci program with types 

There are three type declarations here. In order, they are a declaration of the parameter type of n, a declaration of 
t fib's result type, and declarations of the types of the local variables a and b. 

Concurrency 

Concurrency is expressed in Concurrent Smalltalk in several ways: 

• Concurrent argument evaluation. In 

(+ (big-computation 3) (time-sink 738)) 

the expressions big-computation and time-sink can be evaluated in parallel. 

• Expressions in concurrently statements may be evaluated concurrently. The expres- 
sions in parallel statements are always evaluated concurrently. 

• The variable bindings in clet and let statements can also be evaluated concurrently. 
For example, the expressions big-computation and time-sink can be evaluated concur- 
rently in 

(cset a (big-computation 3) ) 
(cset b (time-sink 738) ) 
(+ a b) 

as well as in 

(let ((a (big-computation 3)) 
(b (time-sink 738) ) 
(+ a b) ) 

• The computations in assignments using cset and in function calls whose result values 
are unused can be done concurrently with neighboring statements. 



*See also return values in section A.5 for a description of specifying types of multiple results. 
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• The computations done for futures are always evaluated in parallel. 

The action of a cset can be thought of as storing a promise (known as a cfuture) to calculate 
the value of a variable. For example, after 

(cset a (big-computation 3) ) 
is executed, a will contain either the value of (big-computation 3) or a cfuture promising 
to deliver that value when it is needed. If a contains a cfuture, (big-computation 3) is 
evaluated in parallel by a different task. At the same time, execution of the method can pro- 
ceed and the method can perform another time-consuming task. It will not have to wait for 
(big-computation 3 ) to complete until the value of a is needed. 

Sometimes it is desirable to explicitly wait until the value of an expression is available before 
continuing. This is called either touching or forcing the expression. Touching or forcing an 
expression that evaluates to a normal value does nothing. Touching or forcing an expression 
that evaluates to a cfuture causes evaluation to wait until the value of the cfuture is avail- 
able. Finally, touching an expression that evaluates to a future does nothing, while forcing it 
causes evaluation to wait until the value of the future is available. The resulting value is 
then touched or forced again until the touch or force operation does not change it. 

An expression can be touched using the touch statement and forced using the force state- 
ment. Since built-in methods and functions usually touch or force their arguments, touching 
and forcing are rarely done explicitly. 

The reference manual in Appendix A defines more precise semantics for what expressions 
may or may not be evaluated in parallel. 

Locks 

(defclass resource (object) 
l:lock 
... other fields) 

(def method init resource () 

(cset 1 (new-simple-lock)) ,-Creates an initially available lock 
... other initialization code) 

(defun new-resource () 
(init (new resource) ) ) 

(defmethod access resource (parameters 
(acquire 1) 

... code to perform the access using parameters ... 
(release 1) ) 

(defmethod access2 resource (parameters 
(with-locks (1) 
... code to perform the access using parameters ...) ) 

Figure 2-6. Lock Example 

This example defines a class resource that contains a lock. Every call to access acquires the lock when it 
starts and releases it when done, so the code in the middle of the access method cannot be interrupted by an- 
other access method. The with-locks macro is a convenient shorthand for acquiring and releasing locks; the 
access method could have been rewritten as access2. 

Locks are used to synchronize computation by Concurrent Smalltalk programs. Locks are 
especially useful around critical sections of code where only one process may access a re- 
source; a process that wants the resource acquires a lock before accessing the resource and 
releases it when it is done. Two variants of locks are provided. Simple-locks are fast locks 
which, however, perform poorly when many processes are waiting for a resource; simple- 



2 See section A.6. 
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locks should be used in situations in which the probability of contention for a resource is 
small. Queueing-locks are slower locks designed to handle a large amount of contention. 

As an example of the use of locks, suppose one wants to restrict the use of a resource so that 
only one process can access it at a time. To accomplish this exclusion, a lock can be associ- 
ated with the resource, in which case every process should acquire the lock before using the 
resource and release it when done. Figure 2-6 shows sample code used to access the resource. 



Distributed Objects 



(def class distarray (distobj) 
value) 

(defun new-distarray (size: integer) 
(new distarray size) 

(defmethod get distarray (index: integer) 
(get-value (co group index) ) ) 

(defmethod put distarray (index: integer new-value) 
(set (get-value (co group index) ) new-value) ) 

(defmethod size distarray () 
(logical-limit self)) 

Figure 2-7. Distributed Object Example 

This example defines a class distarray used for distributed arrays. The get method returns the element at 
position index in the array; since each constituent contains only one element of the array, the get method returns 
the value in the constituent specified by the given index. Similarly, the put method routes the message to the 
constituent specified by index, where it stores new-value. The size method simply returns the array's size. 

Whereas standard objects serialize messages sent to them 1 , distributed objects can accept 
and process many messages at a time. A distributed object is comprised of an array of con- 
stituent objects and a common, group name. When a message is sent to the group name, the 
operating system routes it to a constituent of its choosing. The constituent can then process 
the message or send it to another constituent; constituents know how to address each other. 
The co primitive is used to find a particular constituent of a distributed object, while the 
group instance variable can be read to determine the group name of a distributed object 
given one of its constituents. 

For example, a large array might be implemented as a distributed object. When a get mes- 
sage is sent to the array to read a value of a particular element, the message is routed to one 
of the constituents. That constituent examines the given index and forwards the message to 
the constituent containing the element, which reads and returns the value. 

Figure 2-7 shows a simple example of the use of distributed objects to create a distributed ar- 
ray. Each constituent contains only one element of the array to keep this example short; a 
better implementation would use a simple-array at each constituent to reduce the number 
of constituents needed. 

The advantages of using a distarray class like the one in Figure 2-7 is that many accesses 
can be made to the array simultaneously; they do not have to pass through a common bottle- 
neck to access the array. In addition, as will be clarified in Section 3.3, the get and put 
methods do not access any instance variables of distarray themselves, so they could be in- 
lined wherever they are called 2 ; thus, reading or writing the distarray in Figure 2-7 could 



1 Except for a few special cases such as immutable objects and messages which do not need to access an object's data 
to execute, only one message may be processing on a standard object at a time. 

2 The compiler's handling of group would have to change a little to permit this optimization; the compiler currently 
treats group solely as an instance variable, but there is no intrinsic reason why the compiler could not provide a by- 
pass path that checks whether a method was called on a group ID (as opposed to a constituent ID) and just uses the 
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involve only two message sends, which is no less efficient than reading or writing a simple- 
array. 

Macros 

Concurrent Smalltalk provides a macro facility which can be used to extend the language. A 
macro consists of a pattern and a replacement. The pattern can contain variables or key- 
words. If it matches with an expression, that expression is replaced by the replacement, 
which can be either another pattern or a Common Lisp function 1 . Much of the language it- 
self has been implemented in terms of macros. Figure 2-8 contains a sample macro which de- 
fines a when form that is the equivalent of a Common Lisp when. 

(defmacro (when ?test . ?body) 
(if ?test 

(begin . ?body) ) 

Figure 2-8. When macro 

The when form defined by this macro takes a test and a number of statements comprising the body. If the test is 
true, the statements are executed one after another, as in begin. If the test is false, when returns nil. This 
macro takes advantage of the fact that if returns nil if there is no else-clause and the condition is false. The 
Lisp dot notation is used to indicate that the body forms the rest of the given list. 



group ID if it was provided instead of always using the group instance variable. When this optimization is imple- 
mented, distributed arrays such as the one above will be as efficient as simple arrays. 

Concurrent Smalltalk functions may be added as replacements later, when the entire compiler and development 
system is rewritten in Concurrent Smalltalk. 
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Optimist II is an optimizing compiler for the Concurrent Smalltalk language described in 
Appendix A. The compiler generates assembly language code for the Message-Driven Pro- 
cessor. 

Optimist II is based on the Optimist compiler described in [21]. Optimist included many 
standard optimizations such as register variable assignment, dataflow analysis, copy propa- 
gation, and dead code elimination [3] [43] that are used in compilers for conventional proces- 
sors. In addition, Optimist included fork and join mergers that try to merge similar (not nec- 
essarily identical) statements on both sides of conditionals, a powerful move eliminator, and 
numerous code generator optimizations to accommodate various idiosyncrasies of the MDP. 

Optimist II is a substantial improvement over the Optimist compiler. While Optimist sup- 
ported only a small subset of an early Concurrent Smalltalk language, Optimist II imple- 
ments almost the entire new Concurrent Smalltalk language. Some language features sup- 
ported by Optimist II that were not present in the original Optimist include: 

Method lookup (Optimist could compile method code but could not associate a method 
with a selector) 

Global variables 

Class and variable declarations 

Macros 

Lambdas and closures 

Multiple inheritance of classes 

Distributed objects 

Multiple return values 

Nonlocal exits 

Functions 

Methods referencing more than one object at a time 

Synchronization primitives 

Arrays 

Methods overriding primitive selectors such as + 

Compile- time evaluation of expressions 

Furthermore, Optimist II contains an interactive language environment, including a Concur- 
rent Smalltalk interpreter and facilities to view code in various stages of compilation. Opti- 
mist II gives helpful warnings and errors when it encounters questionable language con- 
structs. It also includes entire new categories of optimization, including type inference and 
global program optimizations. Finally, Optimist IFs code generator has been updated to con- 
form to and optimize for MDP Architecture version 11B [16] 1 instead of Optimist's Architec- 
ture 10 [23]. 



1 This reference is to MDP Architecture version 11. Version 11B has not been published yet. 
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The only language features listed in Appendix A missing from Optimist II are full futures 
and I/O facilities. It is expected that they will be added later, when the operating system is 
updated to support them. In addition, some optional features of the language such as inline 
objects and first-class continuations have not been implemented, although facilities have 
been provided that will simplify their implementation in the future. 

Structure 

Figure 3-1 shows the overall structure of the compiler. Concurrent Smalltalk code is read 
and parsed by the reader and parser, transformed by the preoptimizer, and saved in the 
global environment. It can be either interpreted using the global environment or optimized 
further by the optimizer and then compiled into MDP assembly code by the compiler and 
assembler. The treewalker controls the compilation process and prevents unused modules 
and objects from being compiled and assembled. 

Reading Guide 

The Data Structures section introduces the common data structures used in the Optimist II 
compiler. A few data structures such as digraphs and hcode appear throughout the compiler, 
and familiarity with them is assumed in the later sections. 

The next three sections discuss the three main components of the compiler environment: The 
Initial Phase includes facilities to read Concurrent Smalltalk expressions and compile them 
into hcode (an intermediate code format), interpret that hcode, and maintain the global Con- 
current Smalltalk environment. This phase executes until the user requests a compilation of 
the program to MDP assembly code, at which time the other two phases are invoked. Most of 
the optimizations in Optimist II are done in the Optimization phase, although a few appro- 
priate optimizations are scattered in the other phases. The Code Generation phase com- 
piles the optimized hcode into MDP assembly language and outputs that assembly language, 
together with immediate objects, class descriptors, and method tables, after performing a few 
final optimizations. The output of the Code Generation phase can be read directly into 
MDPSim. The code generator and MDPSim share the task of linking programs. Finally, the 
Summary section summarizes the important ideas in the compiler. 

Chapter 5, Sample Program, shows the progress of a sample program through various 
phases of the compiler, and it may be helpful to illustrate some of the optimizations. 



1 This reference is to MDP Architecture version 11. Version 11B has not been published yet. 
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3.1. Data Structures 

Utilities 

Optimist II uses a number of supporting data structures throughout the compilation process. 
These include abstractions such as environments, queues, ordered sets, bit sets, and exten- 
sions to CLOS. The supporting data structures are defined in the System Utilities, Utilities, 
and Digraph files 1 . 

An environment associates keys with values. Environments can be atomic, linked to each 
other, and either simple or based on hash tables. Atomic environments allow a series of 
changes to be cancelled, which is a useful operation if a syntax error is found in the input 
Concurrent Smalltalk expression. See the System Utilities file for more information about the 
internal Optimist II environment formats. 

The implementation of digraphs (directed graphs) is discussed in [21]. The digraphs in Op- 
timist II extend that implementation by taking advantage of CLOS's class inheritance mech- 
anism and by automatically marking a digraph altered when any change to it is made, elimi- 
nating some hard-to-find consistency errors. For example, using the dfs function to ask for a 
listing of the nodes of a digraph will always yield an up-to-date list. Furthermore, an Opti- 
mist II digraph has a root dinode that is attached to both the digraph's starting nodes and 
the ending nodes, allowing easy identification of the digraph's exit points. Including the root 
node generalizes some algorithms; for example, the join merger can now join statements at 
the end of the digraph. 

The traversal returned by dfs is not quite a depth-first search— the search order is depth-first 
modified to avoid listing a node ahead of its predecessors whenever possible. If the graph is 
acyclic, no node (except the root) is listed before its predecessors. The digraph dataflow prob- 
lem solver [21] [3] has been updated to detect this condition and solve a dataflow problem on 
a digraph in one pass if the digraph is acyclic; otherwise, the dataflow solver makes two or 
more passes until no node changes. Moreover, dfs automatically detects and removes dead 
dinodes from a digraph; dead dinodes are dinodes which cannot be reached by following the 
edges in the digraph in the forward direction starting at the root, but which can be reached 
from the root by following edges in the undirected digraph. 

The other structures based on digraphs such as modules are similar to those in Optimist. 
See the Digraph file for more details, including the dataflow problem solver and a directed 
graph mapper utility. 

Hcodes 

Hcode is the primary intermediate code format of the Optimist II compiler. It is loosely 
based on I-code found in Optimist. Hcodes 2 are represented by instance objects of CLOS 
classes, and there is no uniform syntax for reading and writing programs in hcode form, 
although the show utility prints hcodes fairly well. In addition, the usage of hcodes is not 
uniform throughout the compiler. The sets of hcodes allowed in different stages of the com- 
piler differ— some hcodes are used early and then banned, while others are introduced just 
before assembly code is generated. The number of hcodes used in the compiler is small and 
fixed— there are only thirteen hcodes, and nine of them are limited to certain phases the 
compiler. Since there are few hcodes, most operations can be expressed in only one way in 
hcode, and the optimization algorithms have to handle only a few cases instead of many syn- 
onymous I-codes, as used in Optimist. 



1 See Appendix E for information on getting copies of the files. 
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Table 3-1. Hcodes 



HCode 


Arguments 


Usage* 


Action 


Directive 


Directive 
Di recti ve-arqs 


1, Pre 


Evaluate top-level directive such as add-method 
on the directive-arqs. 


Application 


Targets 

Funct 

Aras 


1. Pre, 0, 
Post, C 


Apply funct to args and put the result values in 
the targets. 


Asse it-Type 


Argument 
Type 


1. Pre, 0, 
Post 


Assert that the argument value's type is a sub- 
type of type. 


Move 


Target 
Source 


1, Pre, 0, 
Post, C 


Move value from source to target. 


Make-Closure 


The-Lambda 
Sources 


1, Pre, 0, 
Post 


Make a closure out of the-lambda using sources 
as the values of the display arguments. 


Nconcurrently 


Threads 


1, Pre 


Execute threads concurrently. 


If 


Condition 
Arqument 


1, Pre, 0, 
Post, C 


Branch if argument satisfies condition. Table 3-2 
lists the allowed conditions. 


Touch 


Argument 


1. Pre, 0, 
Post, C 


Touch the argument 


Force 


Argument 


1, Pre, 0, 
Post 


Force the argument 


Make-Future 


Target 

Argument 

Lazy 


1, Pre, 0, 
Post 


Make a future which will evaluate the lambda 
passed as an argument. Store the future in tar- 
get. The future is lazy if lazy is true. 


Enter 




I.C 


Commence function or method execution. 


Exit 




C 


Terminate function or method execution. 


Grab 


Arqument 


C 


Temporarily dereference an instance object. 



*The Usage column specifies the stages of the compiler in which the hcode is valid. The stages are: 
I Hcode before initial transformations. 

Pre-optimized hcode. This hcode is stored in the global environment. 

Hcode during most of the main optimization phase. 

Hcode during the MDP-specific post-optimization phase. 

Hcode just before it is compiled into MDP assembly language. 



Pre 
O 

Post 
C 



Table 3-1 lists the hcodes. Most hcodes contain fields such as arguments and targets. An ar- 
gument field can contain any rvalue 1 , while a target field can contain any lvalue. Also, a type 
field can contain any type, while a class field requires a class. The formats of those fields are 
listed in Table 3-3. 

There is no hcode that returns a value from a function or a method. Instead, a special lvalue 
is used to represent a continuation to the caller. A value is returned by storing it using a ref- 
erence to the continuation as a target. Thus, a move hcode with a reference to a continuation 
as a target is really a return statement, while an application hcode with a reference to a con- 
tinuation as a target is a tail-forwarded application. More complicated combinations are also 
permitted — an application hcode that returns two values can forward one to a continuation 
and store the other in a local variable, or continuations to several different callers within 
whose static scopes a function resides could be used. The benefits of not including a return 
hcode are a more orthogonal set of hcodes and a simplification in the tail forwarder, which 
now becomes a somewhat specialized move eliminator. 

Every hcode has exactly one successor in the digraph except the if hcode, which has two, cor- 
responding to evaluating the conditional as true or false. The nconcurrently hcode has only 
one successor, but it also contains a set of nested digraphs, which may be evaluated concur- 



2 Sometimes the word statement will also be used to refer to an hcode. 
1 Rvalues are defined below. 
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Table 3-2. Conditions 

Condition Expression 



:true Branch if the argument is true. The argument must be a boolean. 

:false Branch if the argument is false. The argument must be a boolean. 

:nil Branch if the argument is eq to nil. The argument can be any object. 

:non-nil Branch if the argument is not eq to nil. The argument can be any object. 

:zero Branch if the argument is equal to using the = predicate. 

:non-zero Branch if the argument is not equal to using the = predicate. 

rently, sequentially, or interleaved in any fashion. There is no restriction on the number of 
predecessors an hcode can have. 

Hcodes are rarely processed alone; usually hcodes are embedded in a code-lambda or cst- 
lambda, which represent digraphs of hcodes with header information. A code-lambda con- 
tains a digraph of hcode statements together with a database of local variables used by those 
statements. Each local variable has an optional name, a type, and some declarations such as 
whether it can hold inline objects. Furthermore, the locals in a code-lambda are consecu- 
tively numbered to allow the efficient use of bitmaps to keep track of variable data while 
solving dataflow problems. In addition, a code-lambda shares with cst-functions (another in- 
ternal Optimist II class that describes all functions, including primitives) the interface fields 
which consist of a list of parameters, return values, and display variables used by closures. 

Hcodes are documented in the HCode file. 

Values 

An Optimist II value is a representation of a Concurrent Smalltalk object — it can be, say, an 
integer, a character, a distributed object, a function, a class, or any other valid Concurrent 
Smalltalk object. On the other hand, a variable or a parameter is not a value, but it may con- 
tain a value. In addition, values of a few hidden types such as continuations and continua- 
tion displacements are also used. Many different representations are used for values, and 
these representations will not be described further here; please refer to the Types file for 
more details on this subject. 

An rvalue can be either a value or a location that can be read to obtain a value. Thus, a local 
or a global variable is an rvalue, and so is the Concurrent Smalltalk integer 7. An instance 
variable in general is not an rvalue, but a reference to an instance variable in a particular in- 
stance object is. The common rvalue kinds are listed in Table 3-3. 

Table 3-3. Rvalues 

Rvalue Specializes Notes 



Value Any value is also an rvalue. 

Local Name, scope, etc. A local variable. 

Global Name A global variable. 

Option Name A Concurrent Smalltalk option. 

Ivar-ref Instance variable, An instance variable of an instance object. The 

Instance object instance object must also be an rvalue. 

An lvalue is a location into which a value can be written. Examples of rvalues include local 
variables, references to instance variables in instance objects, and references to continua- 
tions. A continuation by itself is not an lvalue, but a reference to one is. The common lvalue 
kinds are listed in Table 3-4. 
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Table 3-4. Lvalues 



Lvalue 



Specializers 



Notes 



Local 

Global 

Continuation-ref 



Name, scope, etc. 
Name 

Continuation or 
Context and 
Displacement 



A local variable. 

A global variable. 

A reference to a continuation specified either as 

a continuation rvalue or as a pair of context and 

displacement rvalues (See Section 3.3). 



All rvalues are instances of the rvalue CLOS class, all lvalues are instances of the lvalue 
CLOS class, and all values are instances of the value and rvalue CLOS classes. CLOS's mul- 
tiple inheritance is used to define objects that are both rvalues and lvalues or other combina- 
tions of the above. 

Types and Classes 



Table 3-5. Types 



T VPe 



Specializers 



Notes 



Class 



Any class is also a type. 



Continuation-type Continuation-type 



Displacement-type Continuation-type 



A type based on the continuation class that 
represents a continuation that will return a 
value of the continuation-type type. 
A type based on the displacement class that 
represents a displacement field of a continua- 
tion that will return a value of the continua- 
tion-type type. 

A Concurrent Smalltalk class is a Concurrent Smalltalk value that is an instance of the 
class class. Classes are implemented in Optimist II as instances of the CSt-class CLOS 
class. In addition to itself being a value, a class also represents a set of values. For example, 
the class integer represents the set of all integers, which includes, among others, the values 
4 and -17. The class null represents the singleton set (nil). The class class represents 
the set of all Concurrent Smalltalk classes, including itself. 

In addition to classes, Optimist II includes types which provide finer discrimination than 
classes for describing sets of values. Types are listed in Table 3-5. Currently a type is either 
a class or a continuation that returns an object of some type. A type can be always projected 
to a class; the base-class Lisp generic function performs this conversion. A type that is also a 
class projects to itself, while a continuation type projects to the class continuation. 
Although a class is always a value, a type is not necessarily a value. 

Multitypes 

When describing the possible contents of variables, Optimist II uses the concept of a multi- 
type. A multitype is a list of zero or more types; a value is a member of a multitype (satisfies 
that multitype) if it is a member of one of its types. No value satisfies a null multitype, while 
every value satisfies a multitype that has object as one of its types. Routines are provided 
to calculate unions (least upper bound) and intersections (greatest lower bound) of multi- 
types and simplify representations of multitypes. Since multitypes are not necessarily closed 
under those operations, the lub and gib routines may conservatively enlarge their multitype 
results. 
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Global Data Structures 

Two atomic environments, the global environment and the class environment, contain most 
of the state of the Concurrent Smalltalk interpreter. The global environment contains all 
Concurrent Smalltalk globals, parameters, and constants, while the class environment con- 
tains all known Concurrent Smalltalk classes. The global environment is linked to the class 
environment, so the latter is searched if an identifier is not found in the global environment. 

The classes are themselves heavily linked together. Each class object has lists of its immedi- 
ate superclasses and subclasses and all of its superclasses and subclasses, as well as a meta- 
class, a description of its instance variables, and sundry options such as whether the class is 
immutable. To allow typed recursive data structures, an "undefined'' class structure is cre- 
ated when a class name is encountered in a program without being defined. An "undefined" 
class can turn into a normal class when the class is defined; CLOS's change-class construct is 
very valuable here. A substantial number of classes have to be updated whenever a new 
Concurrent Smalltalk class is defined, but compilation speed does not seem to suffer because 
of this. The heavy linking of classes made defining a bootstrapping subset of Concurrent 
Smalltalk classes challenging; some CLOS objects had to be created with the wrong classes 
and then transformed to the right classes. Once the bootstrapping subset of Concurrent 
Classes was defined, defining the remaining classes on top of it was easy. 

A method is associated with both a class and a selector. There is no single method table in 
Optimist II; instead, whenever a method is added, it is added to the selector's list of methods 
hashed by class and the class's list of methods hashed by selector. Thus, a selector knows all 
of the methods defined for it, as does a class. Methods are not replicated in these hash tables 
unless a method is added more than once; instead, the lookup-method function, which returns 
a method associated with a class and a selector, searches the superclasses when a method is 
not defined for a selector and a class; an ambiguous selector error is signalled if there is more 
than one superclass and they are associated with differing methods. 

Current settings of the options are also kept in a global data structure. Each option is de- 
clared as a dynamic Lisp variable, and a list of all options and their default values is kept in 
an object. The #&name reader macro expands into a reference of the option named name. 

Concurrent Smalltalk symbols are not accumulated in any data structure; however, when a 
Lisp symbol is used as a Concurrent Smalltalk symbol, its cst-symbol property is set to the 
Concurrent Smalltalk symbol object to ensure that that object is reused if the symbol is ref- 
erenced again; otherwise, (eq ' sym ' sym) would be false according to the interpreter. 
Number objects are not reused, so (eq 13 13) is false according to the interpreter 1 , but 
(clet ( (x 13)) (eq x x) ) is true. 



^Nevertheless, compiled code will currently return t rue if eq is used to compare two equal integers. The action of 
eq on numbers is purposely not defined in Concurrent Smalltalk to allow an implementation of a bignum package. 
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3.2. Initial Phase 

The initial phase of the compiler reads the Concurrent Smalltalk input and converts it into a 
rough hcode form. Several early transformations have to be done on the resulting hcode be- 
fore it becomes suitable for optimizations. 

The most complicated early transformations create statically scoped functions. The initial 
phase determines parameter interfaces for lexical variable displays [3] used by closures, and 
it does a considerable amount of work to pick those interfaces well. Delaying this decision 
would have made manipulation of functions in that stage very difficult; the advantages of 
splitting nested functions into components early are that every function is self-contained and 
completely owns its local variables — no other function can alter or examine the local vari- 
ables. 

Reader 

A customized Common Lisp reader is used to read the Concurrent Smalltalk programs. The 
customization s consist of using a special readtable and reading all Concurrent Smalltalk 
names into the CST package. The readtable is used to implement the special characters in 
the Concurrent Smalltalk syntax. Most special characters expand into lists; for example, ! a 
expands into ( ! a) . Some character tokens such as : , : : , and , (comma) expand into sym- 
bols with the same names. 

The CST package is used to prevent conflicts between Concurrent Smalltalk symbols and any 
symbols the compiler or the Common Lisp environment might be using. For instance, nil is 
just the name of a constant (which happens to have the value ' nil) in Concurrent Smalltalk; 
nil is not confused with the Lisp nil, which also represents an empty Lisp list. Since the 
colon has a special readtable meaning in Concurrent Smalltalk mode, Concurrent Smalltalk 
symbols are restricted to the CST package. 

Read macros have been inserted into both the Common Lisp readtable and the Concurrent 
Smalltalk one to facilitate easy switching between the two tables. The # $ macro in standard 
Lisp input reads the next token in Concurrent Smalltalk mode, while #" can be used inside a 
#$-expression to switch back to Lisp mode. In addition, the #L macro in Concurrent 
Smalltalk mode reads a list expression and returns a two-element list with the symbol lisp as 
its first element and the expression read as the second. 

Parser 

The parser parses the input expressions into a prototypical hcode form. The parser is a re- 
cursive descent macro evaluator. Each primitive in Concurrent Smalltalk is implemented as 
a macro. There are three main kinds of macros: normal macros substitute Concurrent 
Smalltalk text with other literal Concurrent Smalltalk text as described in Section A. 14, non- 
terminal macros substitute Concurrent Smalltalk text with Concurrent Smalltalk text pro- 
duced by a Lisp function, and terminal macros read Concurrent Smalltalk text and perform 
an action such as emitting hcodes. Furthermore, macros can be restricted to evaluate at the 
top level only. 

The parser, when asked to parse an expression, compares it against macros in its macro list 
in reverse chronological order until it finds a match; when a match occurs, the macro is ex- 
panded as above. If the macro was not a terminal one, the resulting text is expanded again 
until either no macro matches the text or a terminal macro is expanded. If no macro applies, 
the text must be a symbol, which is looked up in the current lexical environment. If the sym- 
bol is not found in the current environment, it is assumed to be an undefined global unless it 
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happens to be one of the Concurrent Smalltalk primitive names or the warn-f ree-refer- 
ences 1 declaration is in effect, in which case an error or a warning is given. 

Macro Implementation 

Since the parser is an intensive user of macros, a fast implementation of macros is used to 
make the parser in the compiler fast. Macros are stored in linked lists hashed by the first 
non-variable symbol in the macro pattern; macros with no such symbols are stored in a sepa- 
rate list. Thus, relatively few macros have to be examined for a given piece of Concurrent 
Smalltalk text. Furthermore, the macros themselves are compiled Lisp functions that check 
that their patterns are satisfied and, if so, compute the text replacement or perform their 
terminal actions. Compiling macros avoids the costly interpreted unification step during pat- 
tern matching. The make-macro-text function in the Environment file compiles a macro into a 
Lisp function. 

If a macro contains an @ directive in its pattern, the macro expander calls itself recursively 
on the text matching the @ directive. In this case it does not allow terminal macro expansion 
on that text. 

Environments 

While the parser is generating code, it frequently needs to determine the meanings of identi- 
fiers. It uses linked environments to keep track of statically scoped identifiers such as the 
names of local variables and continuations. The last local environment is linked to the global 
environment to cause a search of the global and class environments when an identifier is not 
defined locally. Optimist II distinguishes local variables according to whether they are eq to 
each other or not. Thus, no alpha-renaming is necessary anywhere in the parser. Also, a 
lambda may reference local variables it captured from an enclosing lambda. Since most of 
the optimizations cannot handle externally visible local variables, such local variables are 
"unshared" before the optimization pass is invoked. 

Concurrent Smalltalk Runtime 

Most of the Concurrent Smalltalk directives described in Appendix A are macros which ex- 
pand into either other Concurrent Smalltalk primitives or hidden primitives. The Runtime 
file contains a listing of all macros used by Concurrent Smalltalk. 

Top-Level Primitives 

Most Concurrent Smalltalk top-level primitives listed in Appendix A expand into the directive 
hcodes and are evaluated at expression interpretation time. Directive hcodes may be inter- 
preted but not compiled; to ensure that no directive will be compiled, directives are prohibited 
inside lambdas (and, of course, any constructs which expand into lambdas). A few directives 
such as include, top-level set, and def class 2 are evaluated by the reader; those directives 
must be placed at the top level— they may not be nested in any expression except a top-level 
begin, which evaluates its arguments sequentially at the top level. 

Method-Lambdas 

A method-lambda of a class c expands into a lambda with a formal self of type c prepended to 
the method-lambda's formals and a (_with-object (self:c) ...) form surrounding the body of the 
lambda. The _with-object form establishes bindings in the parser's environment that associ- 



1 See Appendix B. 

2 Defclass isn't really evaluated by the reader; nevertheless, it must be a top-level form because it expands into a 
top-level begin containing the internal class definition followed by definitions of accessor and predicate methods. 
The internal class definition has to have been interpreted before the accessor method definitions are read; otherwise, 
the reader will complain about an undefined class. Grouped forms not at the top level and not in a top-level begin 
are read as a group and then interpreted as a group. 
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ate names of c's instance variables to ivar-refs of the corresponding instance variables pointed 
by the self object. The action of _with -object is analogous to that of the symbol-macrolet con- 
struct in CLOS [6]. 

Optimist II does not restrict a lambda to referencing only one instance object; in fact, through 
inlining of method-lambdas or accessor methods, a lambda can reference many objects at the 
same time. Objects may also be referenced through the use of _with-object directly in Con- 
current Smalltalk code, but this practice is discouraged, as it uses a nonstandard feature of 
the language and gains no real functionality. 

Loops 

Although Optimist II can optimize and output code with loops in it, loops are currently not 
implemented this way. The problem is that a Concurrent Smalltalk function with a loop in it 
might execute for a long time and not allow any other messages to be processed at its node. 
To prevent this problem, loops are implemented as closures which pass themselves as argu- 
ments — (while (< i 10) (set i (+ i 1) ) expands into: 

(clet ( (_loop 

(lambda ( (_loop-arg: function &no-leak) ) : : (_while) 
(if (< i 10) 

(set i (+ i 1)) 
(return _while 'nil)) 
(_loop-arg) ) ) ) 
(_loop _loop) ) 

The _loop function is called and passed itself as an argument. If i is less than 10, _loop 
increments i and calls its argument tail-recursively; otherwise, it returns nil to the caller. 
The tail-recursive call breaks the long invocation of the function. 

The compiler is not yet sophisticated enough to detect that the value of the _loop variable 
never changes, so the _loop-arg argument to the internal function can be eliminated and 
the function could call itself recursively directly. 

Initial Transformations 

Immediately after the hcode is created by the parser, a transformation and an optimization 
are done on it. The first transformation flattens all exit hcodes out of every newly created 
lambda. Exit hcodes are generated by the exit Concurrent Smalltalk primitive, which may 
also be a result of the expansion of a return statement. Each exit hcode in the lambda is 
removed and the preceding statement linked to the digraph's root dinode to indicate that the 
execution of the lambda should terminate at that point. Sometimes exit hcodes can be found 
nested inside nconcurrently hcodes; if that is the case, the exit flattener moves as many of the 
nconcurrently's threads outside as it needs to remove all exit hcodes from the nconcurrently. 
Then it flattens the exits as usual. An example is shown in Figure 3-2. 

Simple structural optimizations are done immediately after the exits are flattened. These 
optimizations do not depend on dataflow analysis and can, therefore, be done before lexical 
variables are untangled. The optimizations consist of the following transformations: 

• If statements with identical consequents and alternatives are deleted. 

• If statements conditioned on constants are deleted, and resulting dead code, if any, elimi- 
nated. 

• Move statements with identical sources and destinations are deleted. 

• Assert-type statements on constants are checked and deleted. The compiler generates an 
error if an assertion fails. 
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Figure 3-2. Exit Flattening Example 

Exit statements are inserted by the parser in all places in which the execution of a lambda should terminate. As 
the first transformation, those exit statements are removed and replaced with links back to the root of a digraph. 
For example, part (a) shows the main body of a lambda with two sub-digraphs that are the threads of a nconcur- 
rently. After exit removal (b), all exit paths are linked back to the root of the main body of the lambda, which also 
required the inlining of one of the nconcurrently's threads. 

• Touch and force statements on constants are deleted. 

• Empty nconcurrently statements are deleted. 

• One-thread nconcurrently statements are replaced by their threads. 

The structural optimizations are done for two reasons: First, structural optimizations 
shorten the hcode, using less memory in the later compiler stages and making them run 
faster. Second, structural optimizations may remove some variable references, improving the 
quality of the code produced by lambda-collapsing and the nconcurrently flattener in the op- 
timization phase. 
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Lambda-Collapsing 

Lambda-collapsing is the process of unnesting nested lambdas. After lambda-collapsing, 
each lambda has exclusive access to its local variables. Lambda-collapsing becomes difficult 
when the inner lambdas reference the outer lambdas' local variables and continuations. 
Since continuations are restricted local variables, they will not be discussed here further. 
Lambda-collapsing occupies most of the Preoptimizer file. Since lambda-collapsing is a com- 
plex process, an illustrative example is provided at the end of this section. 

The lambda-collapser (the assign-lexicals Lisp function) examines each outermost lambda in 
the hcode produced by the initial transformations. For each outermost lambda L it looks at 
the lambdas N^, N2, ..., % nested in L and their free variables. Each nested lambda Nj is 
considered to also include any lambdas nested in it. Thus, if, say, N2 contains a lambda N2J 
that references a variable x that is not defined in N2J or N2, then x is a free variable of both 
N2J and N2. If a nested lambda Nj does not reference any free variables, it is a self-contained 
lambda and a first-class data object and does not present any difficulties here. Otherwise, Nj 
is the code portion of a closure. 

The lambda-collapser first calculates the sets of free variables read and written by Nj. Next, 
the lambda-collapser considers each local variable Xj of L. A local Xj is called a mutable lexical 
if it is either (1) written by any Nj or (2) read by any closure Nj and written by L after the clo- 
sure Nj has been created by L and before the closure was called for the last time. Mutable 
lexicals of the first kind are easy to determine by scanning every Nj and checking which free 
variables are written in any hcode in it. To determine mutable lexicals of the second kind, 
the lambda-collapser solves a few dataflow problems on L. In effect, to each variable Xj in L, 
it assigns a state machine Sj (Figure 3-3) and uses the dataflow problem solver to run Sj 
through all possible control paths in L. If Sj ever enters state 4, Xj is a mutable lexical of the 
second kind. The state machine assumes that any local variable Xj that is modified after the 
creation of a live 1 closure which reads Xj is a mutable lexical. Since the compiler cannot cur- 
rently determine when a lambda finishes executing, it cannot optimize local variables that 
are modified by L only after the closures have completed execution. 
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Closure Exists 
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Closure Called 
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Figure 3-3. Lexical Variable State Machine 

Each local variable starts in state at the beginning of the lambda. For each local variable every possible path of 
control flow is traversed and a state updated as above. If the variable ever enters state 4, it must be a mutable 
lexical of the second kind — the variable's value cannot be saved with the closure when the closure is made. 



1 If the closure is not called, it is not a live closure, and the variable is not necessarily a mutable lexical. 
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Any variable that is free in one of the lambdas Nj and is not a mutable lexical is an im- 
mutable lexical. Once all of the free variables in the sublambdas of L have been classified, 
the lambdas are separated. 

Each sublambda Nj of L that has free variables is assigned a number of display parameters in 
addition to the normal parameters it has. The values of the display parameters are deter- 
mined at the time a closure of Nj is created. Immutable lexicals are stored directly in the dis- 
play, while mutable lexicals are stored in an object whose pointer is passed in the display. 
More than one such object may be present if Nj uses mutable lexicals from several levels of 
enclosing lambdas. 

Once the display parameters are assigned to the sublambdas, the code of L is modified to 
store the display parameters into a closure whenever one is created, and the Nj's are modified 
to use the display parameters instead of referencing L's locals directly. If L has any mutable 
lexicals, it creates an object containing them upon entry and treats mutable lexicals as if they 
were instance variables of that object; any mutable lexicals that are also parameters of L are 
copied into that object as soon as it is created. The object containing mutable lexicals is itself 
mutable, so only one copy of it per invocation of L can be present on the J-Machine. The ob- 
ject is not disposed because Optimist II cannot determine the temporal lifetime of a closure; 
the object and the closures have to be garbage-collected. 

After the above transformation, L has exclusive access to its locals. Since some of the Nj's 
could themselves have locals used by their sublambdas, the lambda-collapser calls itself re- 
cursively on every lambda and closure contained in L, even if that lambda did not have any 
external free variables. 

Efficiency Considerations 

There are several advantages for using immutable lexicals instead of mutable lexicals: 

• Immutable lexicals are stored directly in a closure's display, so the closure has immediate 
access to their values. 

• Closures are immutable objects. If many closures are executing simultaneously, many 
copies of the closures and their immutable lexicals can be made. On the other hand, if many 
copies of a closure with a mutable lexical are executing, the copies will be contending for the 
single object containing that lexical's current value. 

• The outer lambda can store immutable lexicals in its context or in registers, while it has 
to allocate an object for mutable lexicals and keep their values there. 

In order to ensure that lexically scoped variables are immutable lexicals, the programmer 
should check that their values are not altered after any closures which might reference them 
are created. 

Example 

Consider the following code: 
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efun outer (x) 




(clet ((y 3) 




(z 4) 




(t 1)) 




(clet ((innerl 




(lambda 





( (lambda () (cset 


(write 


x y))) 


(inner2 




(lambda 





(write 


y>)> 


(inner3 




(lambda 


(a) 


(write 


a)))) 


(if (zero? x) 




(innerl) 




(cset x 5) ) 




(cset z 3) 




(inner 2) 




(write x y z t) ) ) ) 



x z) ) ) ; inner 11 



The lambda-collapser first determines that the outer lambda has no free variables, so it is 
made into a normal function instead of a closure. Next it examines the three sublambdas 
within outer: innerl, inner2, and inner3. Innerl will become a closure because it has three free 
variables, x, y and z. It writes to x, so x becomes a mutable lexical; although innerl does not 
write to y and z, another lambda might, so y's and z's statuses are unknown. Inner2 will also 
become a closure because it has one free variable, y, whose status is still unknown. Since in- 
ner3 has no free variables, it becomes a normal function. 

Next the lambda-collapser runs the state machines on the x, y, and z locals in outer; outer 
also has other locals such as t, innerl , inner2, and inner3, but those are not referenced by any 
inner lambdas. X is already known to be a mutable lexical of the first kind. Y is not written 
anywhere after innerl and inner2 are created, so it is an immutable lexical. Z is written after 
the innerl closure is created, and the compiler makes it a mutable lexical of the second kind. 
Unfortunately, the compiler does not realize that z is altered only after innerl finishes exe- 
cuting; if it were smarter, it could have made z an immutable lexical. Finally, the lambda- 
collapser creates the displays and alters the code of the lambdas to produce a parameter- 
passing pattern shown in Table 3-6. 

Table 3-6. Lambda-Collapser Example Results 



Name 


Outer 


Innerl 


Innerl 1 


Inner2 


Inner3 


Parameters 


x (copied into 
lexical-object) 








a 


Returns 


continuation-0 


continuation-1 


continuation-2 


continuation-3 


continuation-4 


Display 




lexical-object 
V 


lexical-object 


y 




Locals 


y 

t 

innerl 

inner2 

inner3 

lexical-object 








a 





lexical-object 


Instance Variables 


X 

z 
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Top-Level Evaluator 

Lambda-collapsing was the last preliminary hcode transformation. At this point the hcode is 
in a format understood by the interpreter. If it found no syntax errors, Optimist II now eval- 
uates the Concurrent Smalltalk expression it just read by running the expression's hcodes 
through the hcode interpreter. If the expression contained any directives, the interpreter ex- 
ecutes them at this time. 

Interpreter 

The interpreter is a simple hcode interpreter for executing Concurrent Smalltalk programs. 
The interpreter is completely sequential. Except for full futures and some unimplemented 
input/output facilities, the interpreter is a valid Concurrent Smalltalk implementation — the 
Concurrent Smalltalk definition allows cfutures to be touched at the implementation's dis- 
cretion, so a completely sequential Concurrent Smalltalk interpreter trivially "touches" each 
cfuture as soon as it is created. While the interpreter never achieves any parallelism, it 
couldn't use parallelism if it had any because it is running on a sequential computer. 

The interpreter in Optimist II was provided for three reasons: 

• It is a powerful constant expression evaluator for expressions encountered while compil- 
ing Concurrent Smalltalk programs. 

• It is the most interactive Concurrent Smalltalk environment, allowing methods and func- 
tions to be changed almost instantly. 

• It permits debugging of Concurrent Smalltalk programs before they are compiled into 
MDP assembly language. 

• It maintains the Concurrent Smalltalk global environment and permits interactive exam- 
ination of that environment. 

Currently the interpreter can only interpret unoptimized hcode; however, a bypass hcode 
path could be added to transfer optimized hcode back to the interpreter. This bypass is not 
quite as simple as it sounds because the format of continuations changes during optimiza- 
tion. 
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3.3. Optimization 

As long as no MDP code output is desired, Optimist II does not leave its first phase. Only 
when a compile command is issued does Optimist II enter its second phase, its first goal be- 
ing to determine just what it should compile. Every compile command requires a root set of 
objects that should be compiled. The compiler uses the treewalker to automatically deter- 
mine the minimum amount of code that has to be compiled and loaded in order to permit 
running the functions in the root set on the J-Machine. 

Treewalker 

The root set specified in the compile command is passed to the treewalker, which appends it 
to its own permanent root set of objects which must always be compiled (Table 3-7). The 
treewalker then calls the optimizer on each code object in its set and scans the optimized 
hcode (if the object is not code, the treewalker scans it directly). If, while scanning, it en- 
counters an object not in its current set of objects, it adds that object to its set, optimizes it if 
necessary, and scans it. The process continues until every object referenced by any object in 
the treewalker's set is also in that set. At that point the second phase of the compiler has 
completed and the treewalker calls the compiler's third stage to compile and assemble each 
object in the set and print the resulting MDPSim code into a text file. 

Table 3-7. Permanent Root Objects 



closure 
# : continuation 
float 
magnitude 
real 

These objects 


boolean 

displacement 

funct 

null 

selector 

are emitted in the oi 


character 

distobj 

function 

number 

standard-cl 

jtput assembly fi 


ass 
le reg; 


# : class 

distributed-class 

global 

object 

symbol 

ardless of which objects were 


context 
#:false 
integer 

primitive -class 
#:true 

compiled, closure 



context, displacement, # continuation, and global are internal Optimist II classes. 

Calling the Optimizer 

The optimizer is called simply by requesting the value of the hcode or mdp-hcode CLOS slot 
in a Concurrent Smalltalk lambda (cst-lambda). If the lambda has already been optimized, 
these slots contain the optimized hcode and hcode optimized for the MDP, respectively. If 
not, those slots are unbound, and CLOS calls the optimizer to calculate their values. Thus, a 
lambda's optimized hcode can be requested repeatedly by the treewalker or the optimizer 
without a performance penalty. To prevent infinite loops, a semaphore keeps a function op- 
timizing a lambda from requesting that lambda's optimized hcode. One of the consequences 
of this rule is that a function may not be inlined inside itself. 

Guide to Optimizations 

The transformations done by the optimizer are summarized in Figure 3-4. The transforma- 
tions can be divided roughly into two classes: general hcode optimizations and MDP-specific 
optimizations and transformations. The general optimizations occupying the first half of the 
optimizer produce optimized hcode. If MDP assembly code output is desired, the second half 
of the optimizer is invoked to convert a number of hcode constructs into simpler, MDP-spe- 
cific ones. For example, the second half of the optimizer converts globals into references to 
global objects, CAS built-ins into code that explicitly compares and sets values, and three-ar- 
gument sums into two two-argument sums. The order of optimization is critical; expansion 
of CASes into compare-and-set code could not have been done in the first half of the optimizer 
because there was no way to assure its atomicity. 
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Figure 3-4. Optimizer Organization 

The first few filters convert the hcode produced by the initial phase into a format usable by the optimizer. The it- 
erative optimizer and function inliner perform the major optimizations. The remaining filters implement some Con- 
current Smalltalk features out of more basic ones and fix a few quirks in the Cosmos and MDP architectures. 
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The transformations new in Optimist II will be described in the order in which they are per- 
formed. The following optimizations will not be described here because they were present in 
Optimist and their concepts have not changed significantly: 

• Dead Definition Eliminator 

• Move eliminator 

• Touch eliminator 

• Tail forwarder 

• Fork merger 

• Join merger 

Also, the structural optimizer was used in the first phase of the compiler and is described 
there. 

Preparatory Transformations 

Lambda Copier and Structural Optimizer 

Before optimizing a lambda, the optimizer first makes a copy of it to avoid destroying the 
copy used by the global environment and the interpreter. While copying the lambda, the op- 
timizer assigns consecutive indices to the lambda's local variables. These indices will allow 
the use of fast bitmaps to represent local variable data during later dataflow analysis rou- 
tines. 

At this stage the optimizer replaces all references to parameters 1 with their values. Once the 
compilation process has begun, values of parameters cannot change, and replacing pa- 
rameters with constants as early as possible enables early constant folding and dead code 
elimination. Parameters are usually used to hold global functions and compiler conditionals 
such as a debugging flag. Debugging code can be compiled conditionally by enclosing it 
within an if statement conditioned on a debug parameter. If debug is false, the code and the 
if statement are removed by the structural optimizer immediately following the copier; the 
remaining optimizations don't even see that code. Dead code is best removed early because 
removing it enlarges basic blocks, permits additional function inlining, and improves the per- 
formance of the dataflow optimizer and tail forwarder. It is unfortunate that conditional de- 
bugging code cannot be removed before lambda collapsing, but doing that would prevent 
changes in the debug parameter from having any effect on existing code. 

The structural optimizer cleans the code to give the nconcurrently flattener maximum lati- 
tude in scheduling nconcurrently hcodes. 

Nconcurrently Flattener 

The nconcurrently flattener removes nconcurrently hcodes from the lambda being optimized. 
Later optimizations run many dataflow calculations on the lambda, and the presence of 
nconcurrentlys would complicate dataflow analysis and make some optimizations less effec- 
tive. In the interest of compiler simplicity I decided to remove nconcurrentlys at this stage. 

The nconcurrently flattener uses a heuristic to interleave the nconcurrentlys it is flattening. 
If it finds a nconcurrently statement with more than one thread, it first calls itself recursively 
on each thread and then separates each thread into a leading and a trailing set of state- 
ments. A thread's trailing set of statements contains the longest string of consecutive hcodes 
at the end of the thread which are not considered worth advancing relative to other hcodes in 
the lambda. The trailing set cannot contain any forks or joins of flow-of-control paths. All 



1 In this paragraph parameters means parameter globals defined in Section A.3, not function parameters. 
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other statements in the thread are placed in the thread's leading set. Once the nconcurrently 
flattener separates each thread into the two sets, it replaces the nconcurrently hcode with all 
leading sets concatenated together followed by all trailing sets concatenated together. 

Hcodes worth advancing are non-built-in function and method calls and any hcodes which re- 
turn values through continuations; all other hcodes are not considered worth advancing. 
Hcodes not worth advancing are pushed as far back as possible by the nconcurrently flat- 
tener, which displaces hcodes worth advancing forward. 

The nconcurrently flattener could use more complicated heuristics to increase parallelism. 
For example, it could realize that no matter how it orders function calls in statements such 
as (f (a (b 1) ) (c (d 2) ) ) , there would remain a possibility of a loss of concurrency 
caused by touching the intermediate results (b 1) and (d 2) in the wrong order. Hence.it 
could split the calculation of, say, (a (b 1 ) ) into a separate function call to avoid a poten- 
tial loss of concurrency. Nevertheless, the nconcurrently flattener's current heuristic seems 
adequate. 

Continuation Expander 

The continuation expander is the one MDP-specific transformation that is done early. So far 
in the compiler, continuations have been represented as single words, while on an MDP a 
continuation is two words — the context to which the continuation is pointing and an offset of 
a slot within the context where the return value should be stored. I originally planned to 
implement continuations as a special case of inline objects, but writing a general implemen- 
tation of inlined objects would have been too time-consuming and inappropriate for an initial 
version of the compiler. Hence, I included a partial implementation of inline objects that 
only inlines continuations. 

The continuation expander expands each local variable of type continuation into two vari- 
ables, one of type context and the other of type displacement. Similarly, each formal and 
display parameter typed continuation is made to correspond to two local variables. A 
move hcode moving a continuation is changed into two moves, while an application hcode calls 
its function with both new locals as arguments. 

Changing structures of instance objects and global variables containing continuations is hard 
at this stage of compilation, so to avoid this problem continuations have not been made first- 
class objects — there is no way to store a continuation in an instance variable of an object; 
disallowing programmer-visible continuation local variables ensures that no continuation be- 
comes a mutable lexical which would get stored in an instance object. 

Iterative Optimizations 

The iterative optimizations perform general dataflow and constant propagation optimiza- 
tions. They are called in a loop until none of them changes the lambda. All of the optimiza- 
tions were altered in some way since Optimist; most had to be updated to handle multiple re- 
turn values and typed variables, and some were changed because reply is no longer an ex- 
plicit hcode. However, only the new features will be described below. 

Type Specializer 

Local variables in Optimist II are associated with types in two ways: 

1. The variable itself has a type supplied by the programmer when the variable is declared. 
This type applies throughout the variable's lifetime. 

2. The programmer can declare types through the use of the type assertion primitive 
(Section A.6), or the compiler can infer from its knowledge about the types of function and 
method arguments and results that a variable has a particular type at a given point in the 
lambda. These type assertions apply only to a particular point in the variable's lifetime. 
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Each type asserted in this manner must have a non-null intersection with the variable's type; 
otherwise, no legal value could be stored in the variable and Optimist II generates an error. 

The type specializer examines each variable and calculates the lub of the types it can assume 
throughout its lifetime, combining all the knowledge it has from assertions of the second 
kind. It then intersects the variable's type with the lub and makes that the variable's new, 
more restricted type. 

Type specialization is done to improve the quality of the move elimination optimization and 
to permit inlining of values in the future. When the move eliminator merges two variables, it 
sets the new variable's type to the lub of the variables' types. The temporaries created by 
other optimizations often have type object even though they can contain only more re- 
stricted values, and if one of them were merged into an existing variable, that variable's type 
would also become object unless the temporary's type were specialized first by the type spe- 
cializer. When Optimist II supports inline classes in the future, type specialization of a vari- 
able to an inlineable class will permit some objects such as double-precision floating point 
numbers and locks to be inlined in local variables. 

Dataflow Optimizer 

The dataflow optimizer has an extra optimization in addition to those mentioned in [21]. The 
dataflow optimizer always checked whether an if statement would always branch one way 
and eliminated the if statement and the dead branch if that is the case. In addition to that 
check, if the if statement has several predecessors, the dataflow optimizer now checks each 
one separately whether it would cause the if statement to always branch one way; if so, that 
predecessor is connected directly to one of the if statement's branches. This situation arises 
often when sc-and and sc-or are used. A code fragment like the one below is generated for 
(if (sc-and a b) (f) ): 

(IF :FALSE (LOCAL CST: :A) 2246) 

(MOVE (LOCAL 387) (LOCAL CST::B)) 

(JUMP 2248) 

(LABEL 2246) 

(MOVE (LOCAL 387) #<False>) 

(LABEL 2248) 

(IF :FALSE (LOCAL 387) 2252) 

(APPLY NIL (#<Lambda CST::F>)) 

(LABEL 2252) 

It is optimized to: 

(IF :FALSE (LOCAL CST: : A) 2252) 
(IF :FALSE (LOCAL CST: :B) 2252) 
(APPLY NIL (#<Lambda CST: :F>) ) 
(LABEL 2252) 

Constant Folder 

The constant folder performs two duties: it evaluates constant expressions and replaces 
method calls with function calls. The constant folder examines each application statement in 
the lambda. If the arguments are all values, the function or method to be invoked is side-ef- 
fect-free, and the precise mode is off 1 , the constant folder calls the interpreter to evaluate the 
function or method call and replace it with move statements of the results to the application's 
targets. If the interpreter generates an error, the compiler aborts the compilation; the error 
is not hidden until runtime. The call could potentially invoke many functions and methods. 

One has to be a little careful with this optimization — if all inputs to a program are specified, 
Optimist II is perfectly willing to precalculate the program's results and compile the entire 
program to a single function that returns the answer. This will happen often on benchmarks, 



1 See Appendix B. 
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especially when Optimist II learns how to automatically determine which functions are Side- 
effect-free; currently it assumes that a function is not side-effect-free unless explicitly de- 
clared so by the programmer. 

In addition to evaluating applications with all arguments specified, the constant folder also 
simplifies built-in operations such as arithmetic and logical primitives according to the iden- 
tity rules listed in Table A-4. 

When the constant folder encounters a method call, it looks in the selector's table of methods 
and selects all methods which match the number and types of arguments provided. The ar- 
guments' types are determined by dataflow analysis of the same information as is used by 
the type specializer. If no methods match, the constant folder signals an error. If exactly one 
method matches, the constant folder replaces the method dispatch with a direct call of the 
method's code, which may even be inlined later. If two methods match, the constant folder 
uses a heuristic to determine whether it is better to do a standard method dispatch or to get 
the type of the first argument and call one of the two methods depending on that type. 

The heuristic is as follows: The classes of the first arguments accepted by the two methods 
are determined. If the two classes are disjoint, the constant folder picks the class that is 
easier to check 1 . If one is a subclass of the other, the constant folder picks the subclass; 
otherwise, the constant folder gives up and does not optimize. If the picked class is easier to 
check than doing a method dispatch, the constant folder replaces the application with a call 
to the class's predicate followed by an if statement with direct calls to the two methods on the 
two sides of the conditional. 

Function Inlining 

Functions are inlined after all iterative optimizations have been performed and can yield no 
more improvements. To inline functions, the function inliner considers each function call 2 in 
the lambda. If the function is not a built-in and is declared inlineable, the function inliner 
attempts to inline it; however, there is no a priori guarantee that it will succeed. A function 
is considered inlineable if it is either declared inline by the user or heuristically inlineable 
and not declared not-inline by the user. To be heuristically inlineable, a function has to be 
small — its optimized hcode can contain no more than two full-fledged function or method 
calls and no more than twelve built-in calls. A point system is used to determine a function's 
"size;" the threshold can be varied by adjusting the inline-size-cutoff option. 

Furthermore, to prevent object thrashing on the J-Machine, a function is heuristically unin- 
lineable if it references an instance variable of an object passed as its first argument if the 
caller of that function does not pass its first argument through as the first argument of the 
function. To see an example of this rule, consider the function sum4 in: 

(def class pair () car cdr) 

(defun sum4 (p:pair q:pair) (+ (car p) (car q) (cdr p) (cdr q) ) 

The car and cdr accessor methods are well under the size threshold. However, only the 
(car p) and (cdr p) calls are inlined— (car q) and (cdr q) are not because the calling 
function sum4 does not pass its first argument p as car's or cdr's first argument. There is 
no problem with inlining (car p) and (cdr p) into direct accesses of p's instance variables 
because sum4 is executed on the same node on which p resides. However, if sum4 were to 
reference q's instance variables directly, it would force q to travel to the same node on which 
p resides, thrashing q. Instead, sum4 calls (car q) and (cdr q) in the usual manner, and 



^ach class has an integer that specifies how easy it is to lest an arbitrary object for membership in that class. If 
that integer is zero, doing this check is no easier than doing a method dispatch; if that integer is a high positive 
value such as six or seven, this test can bo done in one or two assembly language instructions. Built-in classes such 
as boolean or null allow easy membership checking, while user-defined classes do not. 
2 Method calls cannot be inlined unless they were previously converted into function calls by the Constant Folder. 
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the car and cdr methods are executed on q's node and return their results to sum4 running 
on p's node. 

The function inliner tries to avoid forcing objects to migrate whenever possible. This is not 
necessarily the optimal strategy — in some cases it might be better to migrate an object to a 
method that accesses it frequently— but the desirability of migrating the object is difficult to 
determine by the compiler because it depends on the frequency of the object's use by other 
processes in the system. Thus, the simple solution of minimizing object migration was taken; 
in the cases outlined above, a method that makes numerous distant object accesses can usu- 
ally be rewritten as several communicating methods which only access local objects. 

Once the function inliner decides whether it would like to inline a function, it attempts to in- 
line the function's optimized hcode. Nevertheless, it might still encounter difficulties if the 
inlined function performs nontrivial processing after it returns its result. For example, con- 
sider the functions silly-add, shelll, and shell2: 

(defun silly-add (x y) 
(reply (+ x y) ) 
(prove-fermats-last -theorem) 
(exit)) 

(defun shelll (x y) 

(cset ( (z (silly-add x y) ) ) 
(+ z 5))) 

(defun shell2 (x y) 

(silly-add (+ x 5) y) ) 

If the function inliner were to inline silly-add in shelll, it might convert a terminating 
program into a nonterminating one (assuming prove-fermats-last-theorem does not 
terminate in any reasonable amount of time in this example) because shelll would try to 
execute all of silly-add before continuing with the addition of 5 to z. Thus, the function 
inliner should not inline any function that performs nontrivial processing after it replies to 
its caller. On the other hand, there is nothing wrong with inlining silly-add in shell2 as 
long as shell2 is tail-forwarded because shell2 would still return the sum to its caller be- 
fore trying to prove Fermat's last theorem. Other interesting scenarios with callers and 
callees accessing the same lock are also possible. 

The general rule for determining whether it is safe to inline a function is as follows: inlining 
is safe unless the inlined function performs nontrivial processing after replying to the caller 
all return values that the caller is not tail-forwarding. It does not matter if or when the in- 
lined function replies to any other functions in whose lexical environment it might be; i.e. 
non-local lexical returns by the inlined function are fine as long as they don't transfer control 
to the caller 1 . 

After copying the inlined function, the function inliner implements the above rule. It runs a 
dataflow analysis on the continuation local variables in the inlined function to determine 
where each continuation reference can return its value; if it has any problems with perform- 
ing this analysis, it does not inline the function. Next, the function inliner uses the dataflow 
problem solver again to verify that no statement that returns a value to the caller is followed 
by any statement that might not terminate. 

Once all of these conditions are satisfied, the function inliner splices the inlined function's 
code and local variables into the caller. Then it introduces move statements to move the 
caller's arguments to the appropriate locals in the callee. If the callee wasn't non-strict, each 
argument is touched as it is moved. Also, the statements returning values from the callee to 
the caller are modified to store the values in more temporaries, which are moved to their 
proper destinations after the spliced callee's code. Needless to say, the move eliminator will 

value such as six or seven, this test can bo done in one or two assembly language instructions. Built-in classes such 

as boolean or null allow easy membership checking, while user-defined classes do not. 

2 Method calls cannot be inlined unless they were previously converted into function calls by the Constant Folder. 
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have a lot of work cleaning up the extra moves just introduced, but they are necessary to 
make sure that functions are inlined correctly in all cases. 

To make sure that the compiler terminates, it does only one pass of function inlining for each 
lambda; otherwise, it could peel invocations of recursive functions forever. However, the sin- 
gle pass of inlining does not mean that functions are only inlined one level deep; on the con- 
trary, the callees are themselves fully optimized before being considered for inlining, and in 
the process of being optimized they may let other functions be inlined into them. It is true, 
though, that the treewalker's antirecursion rules prevent a function from being inlined into 
itself. 

Once all potential functions are inlined, Optimist II performs another pass of iterative opti- 
mizations to clean up and optimize the code introduced by the inlining process. 

Cleanup Transformations 

Just one final cleanup transformation is done on hcode. The preceding optimizations gener- 
ated a number of local variables in the lambda, many of which are no longer used. The local 
eliminator removes all unused locals and renumbers the remaining locals to fill the gaps. 
This simple transformation has no effect on the code generated by the compiler because Op- 
timist II's third phase will compact the locals anyway. The local eliminator is present solely 
for aesthetic and compilation speed reasons — hcode is less readable if it has many unused lo- 
cal variables. Also, since variable bitmaps are represented as integers, the dataflow code 
runs much faster if no more than about thirty variables are present so Lisp can use fixnums 
instead of bignums. 

MDP-Specific Transformations 

A number of MDP-specific transformations have to be done on hcode before MDP assembly 
code can be generated. These transformations and optimizations are sketched below and are 
listed in the Postoptimizer file. 

Global Expander 

The global expander implements global variables as instances of the global class. Each ref- 
erence to a global variable is replaced with a reference to a global instance object's 
global-value slot 1 . The global instance object itself is a mutable immediate object; its ID 
and initial value are known to the compiler, so the instance object can be referenced by any 
lambda without having to access another global. 

Addressing Mode Flattener 

The addressing mode flattener flattens nested hcode lvalue and rvalue expressions 2 because 
the assembly language compiler can only compile one-level expressions. Whenever the ad- 
dressing mode flattener finds a nested lvalue or rvalue expression, it unnests it and precedes 
the hcode containing it with other hcodes that calculate the expression's components and 
store them in local variables. 

Statement Splitter 

The statement splitter is the first of two MDP built-in optimization filters. This filer con- 
verts associative built-ins such as + and and with more than two arguments into chains of 
two-argument built-in calls, removes type-assertion statements which are no longer needed, 



1 The global class name and accessors to its global-value slot are all undef'd just after they are created, so they 
cannot be referenced by user Concurrent Smalltalk programs, and no name conflicts can result. 
2 See Tables 5-3 and 5-4. 
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and expands many primitives and hcodes such as cas, make-closure and force into their 
components. 

Built-in Optimizer 

The second MDP built-in optimization filter is the built-in optimizer. This optimizer reduces 
the strength of some built-in operations such as multiplication and division by converting 
them into logical shifts using the identities in Table A-4. 

The built-in optimizer is followed by another call to the touch eliminator, which is able to 
eliminate more touches than it could previously. At this point the touch eliminator can de- 
pend on built-ins not being optimized out, so it can remove touches of values which are sub- 
sequently used by built-ins. For example, if a touch of a is immediately followed by an appli- 
cation of + to a and b, the touch can be eliminated; it could not have been eliminated before 
because the + might have been eliminated or another statement inserted between the touch 
and+. 

Instance Variable Target Transformer 

This transformation and the following two correct quirks in the MDP and Cosmos architec- 
tures. One restriction of the Cosmos design is that the targets of full-fledged applications can 
only be local variables in the context; applications other than built-ins cannot store their re- 
sults directly into instance variables or into locals in places other than the context. The in- 
stance variable target transformer scans for application statements that store their results in 
instance variables and modifies them to store the results in local variables and then move 
them into the instance variables. 

Grab Introducer 

The grab introducer generalizes the instance object access mechanism in Optimist. While 
Optimist could access at most one instance object in a lambda, Optimist II can access many. 
Unfortunately, there is only one MDP address register, ID2, assigned to holding pointers to 
instance objects. Hence, before every statement s that might access an instance object, the 
grab introducer checks the value of ID2 left from the previous statement; if that value is in- 
correct, the grab introducer inserts a grab statement just before s to put the right object into 
ID2. If s accesses many instance objects, the grab introducer inserts moves and uses other 
statement-specific techniques to make S access only one instance object; doing this well can 
become quite an involved process for some hcodes. 

The grab introducer also generalizes the instance object part of the Context Optimization 
transformation found in Optimist— if an instance object is not referenced, there is no need to 
point ID2 to it and possibly force it to migrate. 

Cfuture Parameter Eliminator 

The cfuture parameter eliminator complements the instance variable target transformer by 
eliminating application statements that store their results back in a lambda's parameters. 
Unlike Optimist, Optimist II allows function and methods to use their parameters just like 
any other local variables, and, in particular, write into them. However, the operating system 
does not support cfutures in a function's parameter area. Hence, if the cfuture parameter 
eliminator finds a parameter p used as a target of a full-fledged application, it creates a new 
local variable I, emits a move to copy p into I upon entry to the function, and substitutes I for 
every use of p in the function. 

Enter/Exit Introducer 

The last two filters are another call to the local eliminator and the introduction of enter and 
exit hcodes at the beginning and end of the lambda, respectively. The compiler will compile 
these hcodes to the entry and cleanup code for the lambda. 
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3.4. Code Generation 

The third phase of Optimist II contains the hcode compiler, assembly optimizer, and assem- 
bler. The hcode compiler compiles hcode into an assembly language module, which is a di- 
graph of assembly language statements. The assembler and the assembly optimizer then in- 
sert branches into the module and perform peep-hole optimizations on it. Since the hcode 
compiler, assembly optimizer, and assembler were all present in Optimist, only the differ- 
ences will be described here. 

New Hcode Compiler Features 

The hcode compiler has been updated for CLOS, the new Concurrent Smalltalk, and the new 
Architecture version 11B. Major Concurrent Smalltalk changes affecting the compiler in- 
clude introduction of multiple values to application statements and the introduction of many 
built-ins which compile into MDP system calls or sequences of MDP instructions. Built-ins 
for even such low-level facilities such as reading or checking tags were provided, and are ac- 
cessed by the Concurrent Smalltalk runtime system. 

The context and variable allocation schemes have changed somewhat. Optimist's graph-col- 
orer for allocating context local variables worked well and has been extended to also allow 
slots in the message to be reused as local variables; thus the slots in the incoming message 
and the slots in the context form a pool of slots to which the compiler can allocate local vari- 
ables at will. The only restriction imposed by Cosmos is that local variables which might 
contain cfutures cannot be assigned to incoming message slots. 

Unlike JOSS, Cosmos fixes the locations of the saved registers in the context. If a function 
would need more slots than the fourteen provided in a standard context, Optimist II assigns 
the extra locals to slots after the saved register area in the context 1 , up to a limit of 53 total 
slots; the MDP cannot readily address more than 64 words in an object, and 11 are used for 
overhead. If a large context is needed, Optimist II emits code to create it when the function 
starts execution and dispose it when it is done. 

One architectural change had considerable impact on all stages in the third phase. Architec- 
ture 11B allows long immediate constants and long displacement into objects on most two- 
operand instructions but not three-operand ones. Optimist II takes advantage of these oper- 
ations whenever possible, but handling the worst case possibilities is now more complicated. 
For example, it is no longer true that an add instruction allows the same addressing modes 
as a neg. 

New Assembler Features 

The assembler has been upgraded to output many kinds of objects instead of just code. When 
it encounters the use of a pointer to an object inside another object, it outputs an MDPSim 
reference to the pointed object. MDPSim resolves all of these references when it downloads 
the objects to its simulated J-Machine. 

Global Compilation 

Unlike Optimist, which compiled isolated modules, Optimist II compiles entire programs. 
Hence, it has the additional duty of emitting the "glue" that holds programs together. In par- 
ticular, it emits class definitions, method tables, data objects, and code objects. It emits all 
class definitions first because they are needed to load other objects. The order of the other 



1 See Figure 4-9. 
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objects does not matter because MDPSim can resolve references in any order. After emitting 
objects, Optimist II emits code that automatically downloads the objects into the J-Machine. 

Identifiers 

Since MDPSim currently allows only alphanumeric characters and underscores in its identi- 
fiers, Optimist II converts any identifier characters outside that set into strings of characters 
in that set. Next, Optimist II prepends the kind of identifier to each identifier it emits. The 
kinds are listed in Table 3-8. Finally, Optimist II checks whether another identifier with the 
same name has been emitted. If so, and if the other identifier is not eq to the current one, 
Optimist II disambiguates the current identifier by appending two underscores and a num- 
ber to it. This transformation is necessary because sometimes many anonymous functions 
are generated. 

Table 3-8. Identifier Prefixes 



Kind 



Prefix 



Class 


c 


Selector 


sel 


Symbol 


sym 


Function 


f 


Other Object 






IDs 

To allow downloading of circular data structures, Optimist II assigns IDs to all objects it 
emits. In order to do this assignment, it has to know how many nodes there are in the J-Ma- 
chine for which it is compiling. This number is provided in the n-nodes Optimist II option. 

Optimist II uses increasing positive integers to generate serial numbers for classes, selectors, 
and symbols. Functions and other objects are assigned IDs starting with the serial number 
$7FFF and decreasing to avoid conflicts with serial numbers generated by Cosmos, which 
start at $0000 and increase. Optimist II tries to distribute the objects it creates evenly 
throughout the MDPs in the J-Machine. 

Method Tables 

The Optimist II's assembler generates a method table which associate methods with classes 
and selectors. The method table is distributed among the class and selector objects loaded 
into MDPSim. Each selector (Figure 4-17) contains a list of class/method pairs that describe 
all methods defined for that selector. In addition, each class object (Figure 4-13) contains an 
ordered list of the class's ancestors. 

Together, the two objects contain enough information to deduce the method associated with a 
class and selector: first the class is looked up in the selector's list of class/method pairs; if it 
is not found, the ancestor classes are looked up, one by one, in that list. Either a binding is 
found, in which case the binding contains the desired method, or no binding is found, in 
which case there is no method associated with the given class and selector. 

Data Formats 

The formats of built-in objects emitted by Optimist II are described in more detail in the next 
chapter. Primitive objects are listed in Figure 4-2, instance objects of user-defined classes 
are shown in Figure 4-4, and functions are shown in Figure 4-20 and closures in Figure 4-22. 
Optimist II cannot emit immediate distributed objects, but they can be created at runtime. 
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3.5. Conclusion 

The main goal of writing Optimist II was to bridge the gap between Concurrent Smalltalk 
and the J-Machine. Optimist II is the first compiler that can compile a Concurrent Smalltalk 
program into code that can be run on a J-Machine without any changes. Unlike Optimist, 
which compiled only modules, Optimist II compiles entire programs, including the class hier- 
archy, method tables, functions, and immediate objects. Furthermore, Optimist II supports a 
much larger subset of Concurrent Smalltalk than Optimist. Optimist II supports the entire 
language except for full futures and I/O facilities. 

Observations 

The global optimizations included in Optimist II are very useful, as they free programmers 
from having to break abstraction barriers in order to achieve reasonable performance. This 
consideration alone was a great help in writing the runtime system. Many of the built-in 
functions such as zero? and instance variable accessors are candidates for inlining, and, in 
fact, they are often inlined into user programs. Without global optimizations writing an effi- 
cient runtime system would have been difficult and error-prone. Zero? could perhaps have 
been implemented as a macro, but then it would not be possible for a user program to over- 
ride it for its own classes. Moreover, zero? would then suffer from the classic Lisp problem 
of a macro not being a first-class data object and interchangeable with functions. Also, inlin- 
ing of functions may be controlled by fairly sophisticated heuristics, while macros would al- 
ways be expanded. 

The substitution of function calls for method calls is also a useful optimization. In simple 
programs almost all method calls are replaced with function calls and then often inlined. In 
fact, in all the simple and non-contrived examples I have compiled, Optimist II was able to 
remove all method dispatches and replace them with function calls. Even in applications us- 
ing Lisp-style lists, there are usually at most two methods defined on an object — one method 
handles the nil case, while the other handles the non trivial case— and Optimist II turns the 
method call into a comparison of the argument against nil followed by one of two function 
calls, often inlined. 

Generality or Simplicity? 

One recurring issue was whether Optimist II should be a compiler for a general target or a 
compiler specifically tailored to the J-Machine. Ideally, Optimist II should have a back end 
that could be replaced to compile code for a different architecture. Unfortunately, this ideal 
was not achieved. Although many MDP-specific transformations are collected near the end 
of the Optimizer, some, such as the continuation expander had to be placed earlier in the 
compilation process. Worse, much of the runtime system at the very front of the compiler is 
heavily dependent on the MDP architecture. 

The two issues at odds here are generality and simplicity. Due to the limited scope and ex- 
perimental nature of this project, I resolved conflicts in favor of simplicity. For example, 
Concurrent Smalltalk is a useful systems programming language, and it was desirable to 
implement some features of Concurrent Smalltalk in Concurrent Smalltalk. While this ap- 
proach would make the Optimist II front end nonportable, I decided to use this approach 
anyway because it made the runtime system simple to write, understand, and modify. 

Future Plans 

Optimist II is still an evolving compiler, and it will surely change in the future. In addition 
to implementing the remaining language features and fixing bugs, Optimist II could be ex- 
tended to implement inline objects and the load balancing ideas discussed in Chapter 8. In 
addition, a number of minor tweaks mentioned in [21] are still possible. Now that branches 
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have a longer range, Optimist II could be more liberal with the use of MDP register RO to 
hold values between statements 1 . A smarter register allocator could assign a variable to a 
register for part of its lifetime. The peephole optimizer could replace branches to suspend 
instructions with SUSPEND instructions themselves. The implementation of closures could be 
made faster. The compiler could automatically detect side-effect-free and no-leak functions; 
this information might permit it to explicitly deallocate some objects such as closures if it 
could prove that they could not be referenced again. Overall, though, it seems that, except 
for loops which are deliberately broken to avoid hogging processors, no more than a few per- 
cent more performance can be squeezed out of the code generated by Optimist II; however, 
since the operating system overhead time overwhelms the execution time in Concurrent 
Smalltalk methods, there might be room for improvement through coordinated compiler and 
operating system changes. 



1 In Architecture 10, all but the shortest branches required the value of RO to be altered, rendering that register 
practically useless for holding values between statements. 
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Design Goals 

The Cosmos operating system was designed primarily as a support kernel for running Con- 
current Smalltalk programs on the J-Machine. Nevertheless, Cosmos is not specialized to 
Concurrent Smalltalk, and many of the operating system's components could be used to sup- 
port a general message-passing environment. 

The goals in designing the operating system were, in order: 

1. To make a working operating system. 

2. To make the operating system as efficient as possible. 

3. To make the operating system as simple and flexible as possible. 

The design of the operating system also had to be small enough to allow both it and most of 
the Optimist II compiler to be written in one semester; for this reason garbage collection and 
load management facilities were not included in the operating system. Several steps were 
taken to achieve goal (1), including the criticality system and the debugging techniques de- 
scribed later. The criticality system is an organized accounting method used to ensure that 
no re-entrancy problems occur when operating system routines call each other. Features 
were added to MDPSim to detect and signal race conditions known as hazards. To achieve 
goal (2), the entire operating system kernel was written in hand-optimized assembly lan- 
guage. Poor J-Machine performance can no longer be blamed solely on the operating system. 
Goal (3) was achieved by providing general data structures that are reused in many compo- 
nents of the system. 

Functionality 

The operating system assists Concurrent Smalltalk programs by providing the following ser- 
vices: 

Initialization and setup of the J-Machine. 

Providing fault handlers for faults needed to keep the J-Machine running. 

Global function calls and returns. 

Looking up methods corresponding to class/selector or object/selector pairs. 

Context allocation and deallocation facilities and conventions. 

Local and global object allocation, deallocation, lookup, and migration facilities. Mutable 
objects exist on only one node at a time, while immutable objects can exist on many nodes at 
a time; all but the primary copy can be purged when extra memory is needed. 

Support for distributed objects as defined in Concurrent Smalltalk. 

Support for Concurrent Smalltalk primitives such as determining the type of an object. 

Calls assisting in the creation and evaluation of closures. 

An integer division routine. 

Debugging and consistency-checking facilities. 
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Figure 4-1. Operating System Organization 

The arrows represent calling patterns in the Cosmos operating system. Every module uses the fault handlers; 
those dependencies were omitted for clarity. The modules in bold boxes are roots— they are invoked by the user. 

The modules in the top section of the figure are written in Concurrent Smalltalk; however, the CST Runtime mod- 
ule may not necessarily be portable to other Concurrent Smalltalk implementations because it references some 
MDP data structures. The modules in the middle section are written in MDP assembly code because they imple- 
ment functionality that cannot be easily expressed in Concurrent Smalltalk. From the point of view of the rest of 
the operating system, though, these modules are indistinguishable from compiled Concurrent Smalltalk code. The 
modules in the bottom section are fixed in the memory of every MDP either because they are critical to the MDP's 
operation or because calling them as functions would be inefficient. 

After Cosmos initializes the J-Machine, a Concurrent Smalltalk program can be loaded using 
Cosmos's downloading facilities. Once the program is loaded, a single call to Cosmos's Apply 
handler can start the execution of one function in the program. Whenever a function needs 
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to invoke another function or method, it first calls the Cosmos Ob jectNode routine 1 to de- 
termine a good node for that invocation and then sends an Apply message or one of its vari- 
ants to that node. The target node, upon receiving that message, executes the Cosmos Apply 
handler that fetches the function or method code and calls it. 

Many functions need to store local state in memory, either because they need more variables 
than will fit in the MDFs registers or because they make function or method calls and need a 
place in which to save state for the duration of the call. Cosmos uses contexts to save state 
and provides routines to allocate and deallocate them. 

In addition, Cosmos manages objects globally, migrating objects and code to the nodes that 
need them. Cosmos keeps only one instance of immutable objects, but it can make copies of 
immutable objects and code. Also, Cosmos provides routines to determine the type of an ob- 
ject and to create and address distributed objects. Finally, Cosmos provides primitives such 
as division that would be hard to implement in Concurrent Smalltalk. 

Structure 

The operating system is composed of interacting modules shown in Figure 4-1. The high- 
level modules are built in layers out of lower-level ones; however, the low-level modules are 
deeply interrelated because of the hardware restrictions of the MDP. Furthermore, due to 
efficiency considerations and hardware restrictions on faulting, much of the code in some of 
the managers is inlined inside other managers. This is especially common at the lowest 
levels such as the heap and context managers. 

Reading Guide 

This chapter describes the handlers in the two lower sections of Figure 4-1; the Concurrent 
Smalltalk code is described in Chapter 3. After a brief overview, the handlers will be de- 
scribed in this chapter from the bottom level up. 

Heap Manager 

The heap manager manages the heap on each MDP. The heap allows allocation, dealloca- 
tion, and purging of arbitrary objects in the local memory on the MDP. All object references 
are bounds-checked, and primitive compaction facilities are provided. 

BRAT Manager 

The BRAT manager keeps track of the BRAT— Birth/Residence Address Table [38]. The 
BRAT is an associative table used mainly for translating virtual addresses to physical ad- 
dresses, although it is also used for some housekeeping tasks in object migration. 

Object Manager 

The object manager combines the facilities of the heap manager and the BRAT manager to 
provide a virtual name space for the objects allocated by the heap manager. The object man- 
ager is capable of allocating objects on the local node and giving them unique names. It can 
also determine that an object does not reside on the local node, but it cannot access nonlocal 
objects. 

Context Manager 

The context manager keeps track of contexts. A context is the MDP equivalent of an invoca- 
tion descriptor on a conventional computer. The context contains values of the local variables 



1 Sometimes that call is optimized by Optimist II to a single MOVE from NNR instruction. 
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of a process, saved data and ID register values, and the instruction pointer (IP) when a pro- 
cess suspends. 

Global Object Manager 

The global object manager is an extension of the object manager to the global virtual address 
space of the J-Machine. The global object manager can access nonlocal objects, and it can 
migrate objects between nodes. It can distinguish mutable objects from immutable ones and 
maintain copies of the latter on many nodes. 

The global object manager also can determine the class of an arbitrary object, and it is the 
lowest level in the operating system that implements distributed objects. 

Method Manager 

The method manager implements an association between classes, selectors, and methods on 
top of the global object manager. The method manager can, given a class and a selector, 
quickly determine the appropriate method that represents applying the selector to an object 
of that class. 

Control Manager 

Function and method calls and replies are dispatched by the control manager. Every func- 
tion or method call is actually a message send to an entry point in the control manager, 
which interprets the incoming message, makes sure it is valid, fetches the called code, and 
runs it. The control manager also handles suspending after cfuture faults and resuming 
when a called function or method returns a value. 

Utilities 

The operating system kernel includes commonly-used utilities that would suffer too much 
overhead if they had to be called via the standard function call mechanism. The current util- 
ities include a divide system call and calls that create and evaluate closures. 

MDP Runtime 

The MDP runtime system contains other utilities that have to be coded in MDP assembly 
language. Currently MDP runtime utilities include a method table lookup routine and func- 
tions that create distributed objects. When arrays are implemented, they will also be imple- 
mented as MDP runtime utilities. 

CST Runtime 

The CST runtime system contains utilities which could be coded in Concurrent Smalltalk. 
These utilities implement most of the functions and macros listed in the Concurrent 
Smalltalk reference manual (Appendix A), including locks, some array code, and object-han- 
dling functions such as copiers and destructors, as well as lower-level functionality such as 
global variables. 

Data Representation 

Figure 4-2 shows an overview of the representations of various Concurrent Smalltalk objects. 
The representations of the complex Concurrent Smalltalk object such as functions, selectors, 
and classes will be explained in more detail in the following sections. 
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Figure 4-2. Concurrent Smalltalk Object Representations 

Primitive objects are represented as above using the MDP's 32-bit words with 4-bit tags. Objects not shown 
above are represented as standard objects using the ID tag. Due to a shortage of tags, nil, symbols, classes, 
selectors, and characters share the same MDP tag, TAGO (also known as SYM), and are distinguished by the up- 
per four bits of the data word. One MDP tag, TAGA, has been retained for future expansion. 

With the current bit layouts, Cosmos is limited to representing 268435456 symbols, 65536 classes, 65536 selec- 
tors, 65536 futures, 32678 objects per node, and 32768 distributed objects in the entire system. The last three 
limitations are especially severe and will be considered in Chapter 8. 
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4.1. Hardware Building Blocks 
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Figure 4-3. MDP Memory Organization 

The data structures above are replicated on every MDP in the J-Machine. All of the data structures except the 
heap reside in fast RAM. The top of the heap resides in fast RAM, but most of it is in slow RAM. 

Figure 4-3 maps the structures addressable in the physical address space of every MDP. The 
heap occupies most of memory and is used for storing and keeping track of Concurrent 
Smalltalk objects and contexts. The BRAT root table is a separate hash table that points to 
the BRAT entries in the heap. The XLATE table is a table used for hardware-assisted asso- 
ciative lookups. In addition, every MDP contains a copy of the Cosmos code and fault vector 
assignments and a small set of globals used by Cosmos and some of the runtime routines. 
Finally, every MDP contains two hardware-managed incoming message queues. 
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Priorities 

Each MDP provides three levels of execution priority — background, priority 0, and priority 1. 
The network allows messages to be sent at priority or 1; when a message of a given priority 
arrives at a destination node, it is queued in the appropriate priority's queue. The queues 
are constantly monitored by the CPU, and if a queue contains a higher-priority message than 
the task currently running, the current task is pre-empted to handle the message. 

Cosmos currently only uses the background and priority levels. It is anticipated that prior- 
ity 1 will be used in the future for garbage collection and resolving emergencies such as 
queue or memory overflow. In addition, on a real J-Machine (as opposed to MDPSim), prior- 
ity 1 will make a good debugging channel. Cosmos's use of the background priority is cur- 
rently limited to initialization; it would be nice if background mode could be used for incre- 
mental heap compaction, but that may be difficult — because of flaws in the MDP architec- 
ture, the background priority and priority share the same sets of globals, ID and fault reg- 
isters, and fault vectors, meaning that execution of a priority message is likely to clobber 
the state of a background process. 
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4.2. The Cosmos Kernel 

Criticalities 

Cosmos was fairly difficult to write because almost all of its routines are non-reentrant; thus, 
locations of faults inside Cosmos code have to be carefully controlled. The MDP does not in- 
clude any stacks, which means that each routine and fault handler must save its state in a 
different set of global variables. Furthermore, the low-level routines have to be very careful 
not to alter the same global or register through some combination of system calls and faults. 
Another class of problems consists of critical sections of code in which physical addresses are 
manipulated in data registers or objects are referenced assuming they are present in the lo- 
cal memory. No heap compaction or object migration is allowed in those sections. If a heap 
compaction or object migration were to occur in such a section, the physical address or object 
reference would become invalid. 

To make these problems tractable (but, nevertheless, still difficult), the concept of a critical- 
ity was introduced. The criticality of a system call is a number which reflects what actions 
that system call is allowed to perform. The criticalities are listed in Table 4-1. 

A routine with a given criticality may not call another routine with a lower one. For exam- 
ple, if a routine is sending a message, it may not make a system call or allow a fault of criti- 
cality less than 4 while it is sending the message. Thus, the routine has to force any poten- 
tial cfutures before sending the message, because a cfuture fault has criticality 1. If a routine 
stores a physical address of a heap block in a data register, it must have criticality at least 5 
as long as the address can be read out of the data register. If a routine runs with the MDFs 
fault bit set, it must have criticality at least 6 to prevent a catastrophic double fault. There 
will be no re-entrancy problems as long as each routine's criticality is correct, the criticality 
rules are obeyed, and all possible faults are anticipated. 

Heap Manager 

The heap manager manages the heap on each MDP, allowing allocation, deallocation, and 
purging of arbitrary objects in the local memory on the MDP. The heap manager does not 
use the network, so most of its routines run at criticality 5. 

Heap Structure 

The heap, shown in Figure 4-3, is organized as a contiguous block of memory. Objects are 
allocated from the bottom (lower addresses) up, while BRAT entries are allocated from the 

Table 4-1. Criticalities 

Value Actions Allowed __^_ 



All actions are allowed. Caller's registers do not have to be preserved. 

1 Caller's registers must be preserved. May suspend, so MDP's globals are not pre- 
served. 

2 No suspending faults, no modification of context state. 

3 No suspending faults, no modification of context state, no object migration. 

4 No message sends, no object migration. 

5 No heap compaction, no message sends. 

6 No faults or system calls, no heap compaction, no message sends. 

7 No priority 1 interrupts, no faults or system calls, no heap compaction, no message 
sends. 
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True if this object is a fast context. 
True if this object is free. 
True if this object is copyable. 
True if this object is purgeable. 
True if this object is locked. 
True if this object is marked. 

Figure 4-4. A Heap Block 

Each MDP heap block consists of a header and ID words followed by user-defined data. 

top down by the BRAT manager. The objects in the heap between FixedHeapStart and 
HeapStart are nonrelocatable — once allocated, they are never moved. Currently that area 
is used for storing a few fast contexts. The rest of the heap is dynamically divided between 
relocatable objects and BRAT entries. The FirstFree pointer points to the first unused 
word of heap memory, while the LastFree pointer points to the first word used for BRAT en- 
tries. 

Heap Blocks 

Each heap block has the structure shown in Figure 4-4. The presence of the length of the 
block in the first word and its virtual ID in the second word allows the heap to be scanned 
and compacted quickly. 

The heap manager uses only the free, purgeable, and marked flags, which have the following 
meanings: 

• Free. The heap manager will reclaim storage from those blocks when it needs extra 
memory. 

• Purgeable. The heap manager can purge those blocks when it needs extra memory. 

• Marked. A purgeable block is marked if it has not been accessed for a while. It will be 
purged at the next opportunity. 

The copyable and locked flags are managed by the global object manager, while the context 
manager uses the fast context flag to distinguish fast contexts from standard ones. 
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Object Allocation 

Allocating an object on the heap is usually quite fast, taking about twenty instructions. 
Given the object ID and header word, the AllocOb ject heap manager routine checks 
whether there is enough room in the heap for the object 1 . If so, it creates and returns a relo- 
catable ADDR-tagged word pointing to the physical memory that will be occupied by the ob- 
ject, after initializing the object's first two words and advancing the FirstFree pointer. If 
there is not enough free memory, AllocObject calls the heap compactor to try to free 
enough memory for the object. 

Heap Compaction 

The heap compactor is called whenever a memory request cannot be satisfied. First it invali- 
dates all relocatable addresses cached in the address registers and the XLATE table 2 . Then 
it scans through the heap starting from HeapStart, moving each block as far to the front of 
the heap as possible. As each block is moved, its physical address is updated in the BRAT, 
but not the XLATE table 3 . Deleted blocks are not copied, nor are marked purgeable blocks 4 . 
If a purgeable block was unmarked, it is copied and then marked. The next time the block is 
referenced, that block's marked bit will be cleared by the XLATE fault handler. 

A heap compaction increases the amount of contiguous available memory between 
FirstFree and LastFree. However, if the compaction did not free enough memory to sat- 
isfy the allocation request, another compaction is immediately done. The second compaction 
purges the remaining purgeable blocks from the heap. If the second compaction does not free 
enough memory, the system halts. 

Utility Routines 

The heap manager contains a couple of general-purpose utility routines which illustrate cre- 
ative use of the MDP's fault mechanism. One, BlockMove, quickly moves a block of memory 
from one address to another. The routine uses straightline code followed by an infinite loop 
to copy data. The loop is terminated by a LIMIT fault when a copy is attempted of the first 
word out of bounds of the source block. Similarly, BlockSend quickly sends words of an ob- 
ject until terminated by a LIMIT fault. Without using LIMIT faults these routines would be 
two to four times slower. 

BRAT Manager 

The BRAT manager maintains the BRAT— Birth/Residence Address Table [38] and the 
XLATE table. The BRAT is a general-purpose associative table used mainly for translating 
virtual addresses to physical addresses. The XLATE table is used mostly as a cache for the 
BRAT table. Table 4-2 lists the associations currently maintained by the BRAT manager. 
Like the heap manager, the BRAT manager does not use the network and runs mostly at 
criticality 5. 

The format of the XLATE table is dictated by the MDP hardware. The table is a two-way set- 
associative cache whose location and position are specified by the MDP TBM register. Each 



1 Actually, AllocObject makes sure that there are three more free words in the heap than necessary to hold the 
object in case a BRAT entry will also be allocated for the object. This avoids the difficult situation of being able to 
allocate a heap object but not its BRAT entry; a heap compaction in the BRAT manager would violate criticality 
rules. 

2 Just re-entering each association between a virtual ID and the new physical address would not work because sev- 
eral virtual IDs may alias to the same physical object; the copying code would find only one such association in the 
XLATE table. 

3 Physical addresses are not updated in the XLATE table because if they were, there would bo no easy way of de- 
termining which blocks were referenced between heap compactions. The XLATE fault handler clears the marked bit 
of every block it encounters without a binding in the XLATE table. 

^Nevertheless, if an object's locked flag is set, the object is preserved, even if it is also indicated as deleted or 
purgeable and marked. This action is required to maintain consistency in the global object manager. 
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binding in the XLATE table consists of a key word and a data word. Invalid bindings have a 
NIL data word. The XLATE and PROBE instructions hash the key they receive into the 
XLATE table and check the two possible bindings whether they contain the right key; if so, 
the corresponding data word is returned. The ENTER instruction enters a new binding into 
the XLATE table; that binding might overwrite an existing binding of a different key, so the 
XLATE table is only a cache — bindings are not guaranteed to remain in the table. The hash 
function used is the exclusive-or of the four bytes that constitute the data portion of the key 
word; the tag of the key word does not participate in the hashing. Thus, the XLATE table is 
limited by hardware to 512 bindings, which may not be enough if there are many small ob- 
jects on a node. 

Table 4-2. XLATE and BRAT Associations 



"Virtual" Tag "Physical" Tag Tables 



Association 



ID ADDR XLATE, BRAT Physical object location 

ID INT BRAT Node number of node containing object 

ID context ID BRAT Context waiting for object 

DID ADDR XLATE Physical location of nearest constituent 

TAG0:SEL ADDR XLATE, BRAT Physical location of selector object 

TAGOCLASS ADDR XLATE, BRAT Physical location of class object 

TAGOCLASS INT BRAT Node number of node containing class object 

TAGO:CLASS context ID BRAT Context waiting for class object 

TAG0:SYM none XLATE Symbols are primitive objects 

TAGOCHAR none XLATE Characters are primitive objects 

INT none XLATE Integers are primitive objects 

BOOL none XLATE Booleans are primitive objects 

FLOAT none XLATE Floating point numbers are primitive objects 

CS(INST1) ID XLATE Class/selector lookup 

The above table contains the current associations kept in the virtual tables. A general object (tagged ID or 
TAGO:CLASS) can associate either to a physical address, the node number of the node thought to contain the 
object, or a context waiting for the object. In the last case, if the object is being accessed, the current process 
suspends and puts itself onto the list of contexts waiting for the object. Selector objects are just like general ob- 
jects except that they do not migrate. The DID-*ADDR association is used for quickly getting to constituents of 
distributed objects from the group ID. The results of the DID-»ADDR must be consistent through time— looking up 
a DID on the same node must always yield the same constituent. Looking up a primitive object other than the 
ones just mentioned in the XLATE table must always miss. Finally, due to a shortage of virtual tags, words tagged 
INST1 are used as class/selector keys to the method manager's method cache. 

XLATE and BRAT Table Formats 

Unlike the XLATE cache, entries in the BRAT table are guaranteed to remain in the table 
until they are deleted. As shown in Figure 4-7, the BRAT table is rooted by a small root hash 
table. Each entry in the root table points to a linked list of BRAT bindings with keys that 
hash to the same value. In addition, there is a linked list of free BRAT entries. There are 
several advantages to keeping the BRAT table organized this way instead of the flat hash 
table in [38]: 

• Deleting entries from the BRAT is easy, while at the same time searching the BRAT for a 
missing key is fast. Such searches are common because they occur almost every time an ob- 
ject not present in local memory is referenced. 

• The boundary between BRAT memory and the memory used for objects in the heap is ad- 
justed dynamically. Thus, accurate predictions of the average size of an object needed in [38] 
become unnecessary. 

• No memory is wasted keeping the flat hash table no more than 70% full. On the other 
hand, linked lists require one additional word per BRAT entry for the links; however, it is 
conceivable that BRAT entries could be stored contiguously with their objects, eliminating 
this waste. 
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Figure 4-5. XLATE Table Format 

The XLATE table's position and length are specified by the MDP TBM register. The XLATE table is a two-way set- 
associative cache composed of key/data pairs of words. A NIL data value specifies an invalid entry. The XLATE 
and PROBE instructions provide hardware support for quickly looking up keys in the cache. 
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Data 
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Figure 4-6. BRAT Entry Format 

Each BRAT entry is a linked list entry associating a key word to a data word. 

BRAT Routines 

There are three main routines for managing the BRAT table. They are: 

• EnterBinding, which enters a new binding of a key to a data word. This routine uses a 
binding from the BRATFree linked list whenever possible. However, if that list is empty, 
memory is allocated from the back of the heap, moving LastFree forward by three words, 
which might force a heap compaction. 

• LookupBinding, which returns the data word associated with a key or NIL if there is 
none. 

• DeleteBinding and PurgeBinding, which remove a binding from the BRAT. The 
binding must have been present in the BRAT. In addition, PurgeBinding removes the bind- 
ing from the XLATE table. 

Heap Compaction 

The current heap compactor in the Heap Manager does not attempt to compact free BRAT 
entries linked on the BRATFree list. Thus, once memory is used for a BRAT entry, it can 
only be used for another BRAT entry. Nevertheless, performing such compaction by moving 
BRAT entries up in memory would not present any special difficulties. 
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Figure 4-7. BRAT Table Format 

The BRAT entries are kept in linked lists rooted in the BRAT hash table. 
rate linked list. 



Free BRAT entries are linked in a sepa- 



Object Manager 

The local object manager combines the facility of the heap manager and the BRAT manager 
to provide a virtual name space for the objects allocated by the heap manager. The local ob- 
ject manager can allocate objects on the local node and give them unique names. The local 
object manager is tightly interwoven with the global object manager, so the distinction be- 
tween the two managers is only conceptual — their code is inlined together in common rou- 
tines. 
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Figure 4-8. Object ID Formats 

Words with the above formats are virtual addresses of objects on the heap. Special care must be taken when 
handling virtual addresses which are also futures to avoid forcing them prematurely. 

Object IDs 

The Object Manager recognizes several formats of object IDs and virtual addresses, as shown 
in Figure 4-8. In addition, the Object Manager can generate unique new standard object IDs 
by incrementing a local serial number counter and adding it to the local node number. Since 
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no mechanism exists currently for reclaiming IDs, the system will fail after 32768 local ob- 
jects have been allocated at one node. See Chapter 8 for a discussion of what could be done 
about this problem. 

Each of the IDs in Figure 4-8 contains a home node number in the lowest 16 bits. For fu- 
tures, standard objects, and distributed object constituents, the home node number is merely 
the network number of the MDP that serves as the object's home; any unused bits must be 
zero. However, for classes and selectors, any of the lowest 16 bits not used for storing the 
network number are used to distinguish among several class or selector objects sharing the 
same home. For example, on a 1024-node J-Machine arranged as 16x16x4, bits 0-3, 5-8, and 
10-11 hold the home's x, y, and z coordinates, respectively, while bits 4, 9, and 12-15 disam- 
biguate among classes or selectors living on the same node. For this configuration, the class 
object's home node number can be obtained by logically ANDing the class number with 
%0000110111101111. See Figure 5-12 for more on this. 

Why not use bits 16-27 to disambiguate classes and selectors living on the same node, as is 
done for objects? The reason is that several parts of Cosmos require class and selector num- 
bers to be no greater than 16 bits. For instance, a class number is stored in every heap ob- 
ject's header, and the class and selector numbers are concatenated to make a 32-bit word 
during method lookup. 

Routines 

The local object manager provides routines to allocate and deallocate objects. The object-allo- 
cating routine has two variants — AllocNewObject allocates an object given its ID and 
header word, while AllocNextOb ject takes a header word and generates a new ID for the 
object. Both variants then allocate local memory for the object and enter the binding of the 
ID to the physical address in the BRAT and the XLATE tables. AllocNextOb ject is used 
for most of the general object-allocating needs, while AllocNewObject is used in special 
cases — downloading of objects or allocation of distributed object constituents — where an ob- 
ject's ID is predetermined. 

DeallocateOb ject, the local object deallocator, deletes an object's bindings from the BRAT 
and the XLATE tables and sets the object's deleted flag. Thus, the object will be compacted 
during the next heap compaction. If the object was a distributed object constituent, it might 
have had more than one binding in the XLATE table; only one such binding is deleted, so it 
might still be possible to access a deleted constituent object through the other bindings until 
the second heap compaction. This is not an error because the consequences of accessing a 
deleted object in Concurrent Smalltalk are undefined. 

The object manager also provides a handler for XLATE faults. When an XLATE instruction 
that searches for a local object misses, the object manager searches the BRAT for the binding. 
If it finds such a binding, it returns the object's physical address and enters the object's 
binding back in the XLATE table. This is also the point at which the heap manager unmarks 
the object if it was previously marked. If the object's binding was not found in the BRAT, 
further action depends on the value of the XLATE action code 1 — the XLATE fault handler 
might use the global object manager to bring the object onto this node, return NIL, or fail. 

Context Manager 

The context manager maintains contexts which contain local variables and saved register 
values and messages of processes. The structure of a context is shown in Figure 4-9. MDP's 
register ID1 contains a virtual address of the current context at all times when a context 
switch is possible, while Al contains the physical address and length of the context. Contexts 
are used for the following purposes: 



x The XLATE action code tells the XLATE handler what the user of the XLATE instruction wanted to accomplish. 
The action code conveys information such as whether the caller really needed to reference an object (and the object 
should be brought locally if it isn't present) or the caller only wanted to tell if the object exists. 
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Figure 4-9. Context Format 

Standard and fast contexts have the above format except that they are only 25 words long, while long contexts can 
be up to 64 words long (the MDP only allows convenient addressing of the first 64 words of an object). 

There is no saved ID1 field because ID1 points to the context itself, so it has to be known by whatever routine is 
resuming the context. 

The link field is used for several purposes. Contexts on the FastcontextQueue are linked together by their link 
fields. When a process suspends execution, the resumption condition is stored in the link field: if the process 
suspended because it read a cfuture from a local variable in the context, the offset (tagged CFUT) of that local 
variable is stored in the link field. If the process suspended because it referenced a non-local object, the context is 
put on a linked list of contexts waiting for the object rooted at the object's BRAT binding. The old data value of the 
BRAT binding is placed in the link field of the last context waiting for the object. Since the data value of a BRAT 
entry can be an integer, the INT tag cannot be used to represent contexts waiting for cfutures. 

• When a function calls another function, it stores a cfuture in a local variable in its con- 
text and then proceeds to fault on that cfuture. The reply from the called function will store 
its value into the designated local variable, overwriting the cfuture. 

• When evaluation of a function needs to be suspended for any reason, including a cfuture 
fault, the function's registers are saved in a context. 

• When evaluation of a function is suspended, the message that invoked that function is 
copied into the beginning of the context (except for the first two words of the message, which 
are then lost). When the function resumes, A3, the register which originally pointed to the 
message, is aliased to point to the context to allow the function to use A3 to refer to the in- 



65 



Concurrent Smalltalk on the Message-Driven Processor 



coming message regardless of whether the message has been copied into the context yet or 
not. 

Context Availability 

There are four fundamental approaches to allocating contexts: 

1. Always allocate a context at the beginning of every function and deallocate it at the end. 

2. Allocate a context at the beginning of a function that needs a context and deallocate it at 
the end. 

3. Lazily allocate contexts only when necessary. 

4. Always keep a context allocated, even when no message is being processed. 

Approaches 1 and 2 are commonly used for stack frames on stack-based computers. Initially 
I chose approach 3 for the context allocation strategy. Approaches 1 and 2 are simpler but 
have the disadvantage of often allocating unnecessary contexts — most of the leaf nodes of 
computations do not require contexts, and allocating contexts unnecessarily is a considerable 
overhead. Approach 3 worked by storing an invalid address in Al, the MDP's context ad- 
dress register. When a context was needed, the access through Al would fault, and a context 
would be allocated. However, I ran into two difficulties with approach 3: allocating contexts 
through faulting on Al was slow because determining the cause of an INVADR fault on the 
MDP is quite involved, and there were some difficult code sections in the object manager 
where a fault might allocate a context, violating criticality rules. 

Due to the above difficulties, I switched to approach 4, which combines the advantages of 
lazy context allocation with the advantages of always allocating a context. In approach 4, 
when a function finishes executing, it does not deallocate its context 1 ; thus, the next message 
that arrives does not have to allocate a context. There are two places where approach 4 
involves a little extra work than approach 3: when a function suspends on a cfuture or object 
migration wait, it must allocate a new context to avoid having its own context overwritten; 
and when the value of a cfuture is returned or an object arrives, the currently allocated con- 
text must be deallocated and replaced with the suspended function's context. The additional 
context allocation on a cfuture or object migration wait is not a significant penalty because it 
occurs on the tail end of message processing — it does not affect the latency of message pro- 
cessing until the J-Machine is fully loaded. The context deallocation on the reception of a 
cfuture value or an object does add to the latency, but context deallocation is always fast — it 
takes only four instructions. 

To avoid reentrancy and criticality problems, the value in register Al is required to be always 
valid; therefore, any routine, such as the heap compactor, which might invalidate Al must 
recalculate the value of Al when it is done. 

Kinds of Contexts 

There are three kinds of contexts: fast contexts, standard contexts, and long contexts. A 
fixed number of fast contexts is preallocated when an MDP is initialized. Each fast context is 
25 words long. The fast contexts are nonrelocatable heap objects between FixedHeapStart 
and HeapStart. The physical addresses of these contexts never need to be invalidated, so 
these contexts are especially fast. Fast contexts are never deallocated. Enough of these con- 
texts should be allocated to serve a normal computation load on an MDP; the current operat- 
ing system allocates eight per MDP, which is probably too few. 

Standard contexts are like fast contexts in that they are 25 words long, but they are relocat- 
able objects allocated from the main heap area; thus a heap compaction invalidates their 



1 This only applies to functions which use 25-word contexts; functions which use long contexts must deallocate their 
contexts and allocate 25-word contexts upon exiting. 
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physical addresses. Unlike fast contexts, the storage occupied by standard contexts can be 
reclaimed. 

Fast contexts and standard contexts are eligible to be queued on a linked list of free contexts 
rooted by the global variable FastContextQueue. Whenever a 25-word context is desired, 
FastContextQueue is checked first; if it contains a context, that context is unlinked from 
the queue and used. Otherwise, a standard context is allocated. When a fast context is dis- 
posed, it is linked back on the queue. When a standard context is disposed, it is either liked 
back on the queue or deallocated, at the caller's discretion. These queue operations are 
fast — allocating a context from the queue takes five instructions, while deallocating one onto 
the queue takes four. 

Long contexts are contexts for functions which require extra space for local variables. Long 
contexts are identical to standard contexts except that they are longer and ineligible for 
queueing on the FastContextQueue. When a function that might need a long context starts 
executing, it calls the NewContext routine, which replaces the present context with a newly 
created long context. NewContext also copies any relevant state such as the message from 
the fast context to the new, long context. A function which allocates a long context must 
terminate with a call to Suspend, which disposes the long context and allocates a new fast or 
standard context. DisposeContext can be used to dispose a context without allocating a 
new one. 

Allocation and Deallocation Calls 

The routines to allocate and deallocate 25-word contexts are short enough that they are in- 
lined whenever they are needed. The following calls are available for handling long contexts 
and the case in which the FastContextQueue is empty: 

• AllocFastContext creates a new fast context when the queue is empty. 

• Suspend checks whether a fast context was used by the routine. If so, it links it into the 
fast context queue; otherwise, the context is disposed by the heap manager, and a new 25- 
word context allocated. 

• NewContext allocates a new long context. If a context is currently in use, it is deallo- 
cated after the message has been copied from it to the new context. 

• DisposeContext is like Suspend except that it does not allocate a new 25- word context. 

Suspending and Resuming Processes 

When a process must be suspended because it tried to read a cfuture, perform an operation 
on a future, add two user-defined objects together, or reference a nonlocal object, the process's 
state must be saved in its context. In particular, the values of registers that need to be pre- 
served must be stored in the context along with the IP at which execution should resume. 
Furthermore, the reason for suspending must be stored in the link field of the context; other- 
wise, the context might be restarted prematurely, which would lead to a disaster if the con- 
text was waiting for an object 1 . Finally, a new 25-word context is allocated in Al and ID1 to 
prevent the suspended context from being reused. 

When a process is to be resumed, the resuming event is checked against the context's link 
field to make sure that the context should, in fact, be resumed. If it should, the existing con- 
text in Al and ID1 is deallocated, and the values of the registers and IP read from the con- 



1 The reason why restarting a context early would crash the computer is not obvious. The problem is not that the 
process would access a bad object or value — the process would fault and suspend again because it still cannot refer- 
ence the nonlocal object. Instead, the system crash would occur because if a context that had been waiting for mi- 
gration of a nonlocal object were restarted early, the context would not be unlinked from the list of contexts waiting 
for the object. The Reply handler would not even be aware that the context had been present on the linked list. 
Then, when the context's process faulted again on the missing object, it would be added to the list of contexts wait- 
ing for the object a second time, corrupting that list. 
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text. If a resource for which several processes were waiting arrives, one of these processes is 
resumed immediately, while the other ones are resumed later by RestartContext messages 
(Figure 4-10) which the node sends it itself. A RestartContext message deallocates the 
existing context in Al and ID1 and then restarts the specified context. 
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Figure 4-10. RestartContext Message 

The RestartContext message restarts the context specified by the ID. The context must be present on the tar- 
get node. 

Reclaiming Contexts 

The current strategy for reclaiming free contexts by the heap compactor is somewhat hap- 
hazard. Fast contexts are never reclaimed. Long contexts are always reclaimed because 
they are required to be deallocated before their processes can exit. On the other hand, stan- 
dard contexts are reclaimed only if enough processes call Suspend when they are done; 
otherwise, once a standard context is allocated, it is never deallocated. This may be an ad- 
vantage because once a working set of fast and standard contexts is allocated on an MDP, al- 
location of 25-word contexts will always be fast. If the lack of regular deallocation of stan- 
dard contexts turns out to be a problem, it would only be a simple modification to the heap 
compactor to have it scan the FastContextQueue and deallocate any standard contexts it 
finds there. 

Global Object Manager 

The primary means of invoking the global object manager is through the local object manager 
when the latter cannot find a local object. The global object manager extends the local object 
manager to the global virtual address space of the J-Machine. Together, the two managers 
provide an integrated facility for efficiently managing objects globally on the J-Machine. The 
managers can distinguish mutable objects from immutable ones and cache copies of the im- 
mutable objects on many nodes. 

Data Structures 

Every object on the J-Machine has a home node. The home node most likely created the ob- 
ject, and that node has the responsibility of keeping track of the object's location throughout 
the object's life. Objects may migrate from node to node, but the object must inform the home 
node of every such move. If a node needs an object and does not know where it is, it asks the 
home node. Certain objects such as contexts, selectors, and immutable objects do not mi- 
grate, so such objects can always be found at their home nodes. The address of the home 
node is usually encoded in the lowest 16 bits of an object's ID (see Figure 4-8). This is a con- 
venient format because the network ignores the upper 16 bits of a routing address, so mes- 
sages may be sent to an object's home node simply by transmitting the object's ID as the 
routing word. 

In addition to the flags used by the local object manager, each object has three additional 
flags: copyable, purgeable, and locked. An object is copyable if it is immutable. Many primi- 
tive objects are immutable, as are objects belonging to classes declared immutable by the 
Concurrent Smalltalk programmer. Furthermore, the compiler might be able to determine 
that objects of a particular class cannot be mutated and mark then copyable, although the 
compiler does not perform this optimization at this time. When a copy of a copyable object is 
made, the copy is marked purgeable. Thus, many copies of immutable objects can be made, 
and the heap compactor can reclaim storage used by copies that are no longer needed. Set- 
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ting the locked flag prevents an object from migrating or being deleted during critical proto- 
col sections. 
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Figure 4-11. Object XLATE Table and BRAT Entries 

There are five possible BRAT table states for a particular object. Each object must have a BRAT entry on its 
home node. The XLATE table entry, where specified, is optional. The states are as follows: 

I. The object does not exist on this node, and its whereabouts are unknown. 

II. The object exists on this node. Its physical address is given. 

III. The object does not exist on this node, but it is believed to reside on the node specified by the integer. 

IV. The object does not exist on this node, but the contexts linked to its BRAT entry are waiting for its arrival. 

V. The object does not exist on this node, but the contexts linked to its BRAT entry are waiting for its arrival, and 
the object is believed to reside on the node specified by the integer. 

Only states II, III, and V are allowed on an object's home node, while only states I, II, and IV are allowed on the 
other nodes. 
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Every object must always have a BRAT entry on its home node. The BRAT entry can be in 
one of the states shown in Figure 4-11. When an object is initially allocated, its BRAT entry 
is in state II. If an object is in state I on its home node, that object does not exist, and any at- 
tempt to access it halts the system. 

Object Migration 

The object migration protocol is a slightly simplified version of the protocol in [38]. When a 
node requests an object because it does not have the object in local memory, it sends a 
RequestOb ject message (Figure 4- 12c) to the object's home node. If the home node does not 
currently have the object (its BRAT table entry is in states III or V), it forwards the 
RequestOb ject message to the node thought to contain the object. If the home node does 
not know about the object (BRAT state I), it halts the system. This halt is deliberate, for it 
detects accesses to deleted objects. If the RequestOb ject message was forwarded to a node 
that has the object, the message is processed there; otherwise, that node forwards the 
RequestOb ject message back to the home node, and the two nodes keep forwarding the 
message to each other. Nevertheless, since the home node is required to know an object's 
whereabouts most of the time, the home node will eventually learn of the object's true loca- 
tion and forward the RequestOb ject message to the right place. 
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Figure 4-12. Object Migration Messages 

The AcceptObject and AcknowledgeObject messages are used only tor downloading objects into the J-Ma- 
chine and for debugging. The other four messages are used for successive steps of object migration. 
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Figure 4-13. Object Migration Protocol 

When a copy of an immutable object is made, the copy is simply sent to the requester as in part (a). If a mutable 
object has to be moved, the protocol is more complicated because the object's home node has to be kept informed 
about the object's location. 

What happens when the Requestob ject message finds the object depends on whether the 
object is copyable or locked. If the object is locked, the node forwards the message back to it- 
self; the message will be handled once the object is unlocked. If the object is copyable, the 
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node simply mails a purgeable, copyable copy of the object in a MigrateOb ject message to 
the requesting node, which then installs the copy in its memory (Figure 4-13a). If not, the 
protocol becomes more complicated (Figure 4- 13b). The node on which the object is residing 
deletes the object from its memory and BRAT and sends the object to the requesting node in 
a MigrateOb ject message. The requesting node installs the object in its memory, locks it, 
and sends an UpdateHome message to the birthnode, telling it about the object's new where- 
abouts. Finally, the birthnode sends an Unlock message to acknowledge receipt of the 
UpdateHome message and allow the object to be moved again. Since a locked object might 
have been deleted, the Unlock message checks the object's deleted flag and deletes it and its 
BRAT entry if it was set. The last two messages are optimized out if the requesting node 
happens to be the object's home node. 

The object is locked in the last phase of the protocol to prevent the home node from receiving 
the UpdateHome messages from two successive migrations out of order; if that were to hap- 
pen, the home node would lose track of the object's location. Alternatively, counters could be 
used to achieve the same synchronization, but that solution would require an extra word in 
the BRAT and in the object. 

Object Allocation and Deletion 

An object can be allocated either at the local node or on a remote node. The NewLocalOb- 
ject system call allocates an object locally. Unlike the AllocNextOb ject call, 
NewLocalobject takes a class as a parameter and extracts the appropriate header word 
from the class object (Figure 4-14) to use for the object. Reading the class object may involve 
another call to the global object manager if a copy of the class object is not present in local 
memory. 
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Figure 4-14. Class Object Format 

The instance object header word is the word that is stored as the header of every object of this class. That word is 
nil if the metaclass is primitive-class. 

In addition, each class object contains an ordered list of the class's ancestors from the most specific to the least 
specific. The class's ancestors consist of the class itself, its superclasses, its superclasses' superclasses, and so 
on; each class is listed at most once. The ancestors are ordered according to a partial order which always places 
a class before any of its superclasses; thus the class itself is always the first ancestor and object is always the 
last ancestor. 

The DisposeOb ject system call is used to dispose objects, both locally and globally. Dis- 
poseOb ject first tries to dispose the object locally; if the object is locked, it is marked as 
deleted but not disposed; it will be disposed when it is unlocked. If the object does not reside 
on this node, a Dispose message is sent to the object's home node, which follows a route 
analogous to the RequestOb ject message above and will not be discussed further. If the ob- 
ject is present on this node but this is not the object's home node, a DisposeBRAT message is 
sent to the home node to dispose the object's BRAT entry there. If the DisposeBRAT mes- 
sage happens to find another instance of the object on its home node, it deletes that instance 
too. 
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Figure 4-15. Object Creation and Disposal Messages 

The NewObject message creates an object of the given class on the remote node and returns its ID in a Reply 
message. The Dispose message disposes an object on the remote node, while the DisposeBRAT message 
disposes an object's home BRAT entry. 

This protocol successfully deletes the single instance of a mutable object and the unpurgeable 
original of an immutable object along with, perhaps, one copy. Other copies, if any exist, of 
an immutable object are not disposed; however, they will simply be purged out if they are not 
referenced for a while. 

Other Services 

The global object manager provides two routines, ClassOf and TypeOf , that can determine 
the class of any of the objects listed in Figure 4-2. If the object is a primitive object, the 
global object manager returns it class directly. Otherwise, the global object manager extracts 
the class from the object's header and returns. In addition, the global object manager pro- 
vides the Ob jectNode routine which returns the node number of a node likely to contain the 
object. If the object is primitive, Ob jectNode returns a random node number. This system 
call is frequently used in Concurrent Smalltalk to determine the node to which an applica- 
tion message should be sent. 

The global object manager actively participates in the process of downloading a Concurrent 
Smalltalk program to the J-Machine. It provides support for installing objects on nodes 
without migrating them from anywhere. If a node receives an AcceptOb ject message 
(Figure 4-12a), it installs the object and its ID in its memory and the BRAT and responds 
with an AcknowledgeOb ject message (Figure 4-12b) containing the object's ID. 

To avoid difficulties with downloading objects recursively referencing each other, object IDs 
are assigned by MDPSim (see the section about late-binding references in [25]) before the 
objects are downloaded into the J-Machine; hence, an MDP accepting an object must also ac- 
cept the object's ID instead of generating a new one. The IDs assigned by MDPSim use serial 
numbers in the upper range of the allowed numbers, thus preventing ID conflicts with ob- 
jects generated at runtime. 

Finally, the global object manager provides support for distributed objects. This support is 
documented in the distributed object section later. 

Initialization 

Upon powerup each MDP performs the following actions: 
• Clear the address and ID registers at all priorities. 



73 



Concurrent Smalltalk on the Message-Driven Processor 



• Clear the globals to CFUT-tagged words. If an uninitialized global is accidentally refer- 
enced, the MDP will halt because the cfuture handler can distinguish a valid cfuture from a 
CFUT-tagged word that just indicates an uninitialized value. 

• Clear the XLATE table and the BRAT root table to Nl L. 

• Initialize and enable the network queues, but block network message dispatching until 
initialization is done. 

• Clear the heap to CFUT-tagged words. 

• Initialize the global variables that need initializing. 

• Create eight nonrelocatable fast contexts, link them onto FastContextQueue, and ini- 
tialize HeapStart to the first word after those contexts. 

• Unlink one fast context and point priority O's Al and ID 1 to it. 

• Enable message dispatching and fall into an infinite loop in background mode. 

The version of Cosmos for running on a real J-Machine instead of MDPSim has a startup se- 
quence that also includes a self-test of the CPU, a memory test, a network test, debugging 
utilities, and a protocol to let each MDP determine its location on the network. 

Downloading Programs 

In the MDPSim emulation of the J-Machine, a special non-MDP network node called the I/O 
Node acts as the bridge between the compiler and the J-Machine. The compiler outputs an 
MDPSim script which queues a series of objects in the I/O Node. The I/O Node then sends 
AcceptObject messages to the appropriate nodes, waits for the AcknowledgeObject 
replies, and sends more objects until all objects have been downloaded. 

On the real J-Machine, Concurrent Smalltalk programs are also downloaded through a MDP 
that includes special software to communicate with the outside world. Each MDP contains a 
diagnostic port that lets the user halt the MDP and directly examine and change its memory 
and state. The Cosmos kernel is loaded onto the MDPs through these diagnostic ports. 
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4.3. The Cosmos Higher-Level Facilities 

Method Manager 

The method manager associates class/selector pairs with methods, although it could also be 
used for keeping general immutable associations. It provides only one routine, Lookup- 
Method, with a variant, LookupMethodU, which performs less processing of its arguments to 
make it more efficient. LookupMethod takes a class word and a selector word and attempts 
to find the method associated with them; it is the equivalent of the Concurrent Smalltalk 
method primitive. 
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Figure 4-16. Class/Selector Word Format 

The Class/Selector word is formed by combining a 1 6-bit class number with a 1 6-bit selector number. The word is 
tagged CS (which is also the INST1 tag) to avoid conflicts with other kinds of bindings stored in the XLATE table. 

Lookupmethod first attempts to look up the association in the local XLATE cache. It com- 
bines the 16-bit class and selector numbers into a single word, tags that word CS (Figure 4- 
16), and looks for a binding in the XLATE table. If it finds a binding, the binding's data word 
is immediately returned as the desired method. If no such binding exists, Lookupmethod 
sends a Lookupmethod message (Figure 4- 18a) to the selector's home node. The message 
will invoke the LookupMethod runtime function on the selector and the class. 

The LookupMethod runtime function executes on the same node as the selector object 
(Figure 4-17). Each selector has a list of methods defined for it together with their classes. 
LookupMethod first tries to find the given class in the selector object; if it finds it, it returns 
the corresponding method. If LookupMethod cannot find a method for the given class, it gets 
the class object (Figure 4-14) and searches the selector's method list for the class's ancestors 
until it either finds a method or runs out of ancestors. In the latter case the method lookup 
fails and LookupMethod returns nil. In either case LookupMethod returns the result in a 
MethodReply message (Figure 4-18b). The requesting node then associates the 
class/selector pair with the result in its XLATE table. 

The method lookup strategy is conservative in the use of space, taking space roughly propor- 
tional to the number of methods defined in the program. However, the method lookup time 
suffers somewhat, especially when a method is requested corresponding to a deeply nested 
class and a selector with many methods defined; in the worst case the method lookup time is 
the product of the number of ancestors of a class and the number of methods defined for the 
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selector. A binary search could have been used for searching the method table, but it would 
have much worse constant factors, resulting in slower lookup for most methods, because the 
MDP does not have enough registers to support the inner loop of a binary search. 

The methods are stored in selector objects indexed by the class instead of storing them in 
class objects indexed by the selector because the number of selectors is usually much larger 
than the number of classes, and selectors tend to be accessed more uniformly than classes; 
thus, the method lookup table can be distributed more evenly on the J-Machine. 
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Figure 4-17. Selector Object Format 

Each selector object contains a table associating classes to methods. 
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(b) MethodReply Message 



(a) LookupMethod Message 
Figure 4-18. Method Manager Messages 

The LookupMethod message requests a lookup of the class and selector to get NIL or a method ID; the Method- 
Reply message replies to the lookup. 

Control Manager 

The control manager dispatches function and method calls and handles replying from func- 
tions, a task shared with the context and global object managers. The control manager's code 
is relatively short because so much groundwork has been laid by the previous managers. 

Function and Method Dispatch 

The control manager handles three types of messages for calling functions and methods: 
Apply, ApplyFunction, and ApplySelector (Figure 4-19). The first message can be used 
for applying an arbitrary object — a function or a selector, while the other two messages can 
only be used for applying functions or selectors, respectively. The Apply handler checks the 
type of its argument and jumps into either the ApplyFunction or ApplySelector handler, 
as appropriate; the check takes three to five instructions. 
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Figure 4-19. Application Messages 

The Apply, ApplyFunction, and ApplySelector messages have identical formats except for the address 
stored in the header word. Each value returned by the called function corresponds to one two-word continuation 
passed to the function. The continuation specifies either the context ID and slot to which that value should be 
replied or two nils if that value is ignored by the caller. The context IDs passed to the function are not necessar- 
ily the same due to tail forwarding. 



ApplyFunction reads the ID of the function from the message, stores it in MDFs registers 
IDO and AO (the code segment registers), and jumps into the fourth word of the function ob- 
ject (Figure 4-20). The entire process takes only 4 instructions. 

ApplySelector reads the selector and the first argument (the receiver object) from the 
message, uses inline code to quickly determine the class of the receiver, and calls Lookup- 
MethodU to determine the ID of the method that should be called. If the ID is NIL, Apply- 
Selector halts; otherwise, ApplySelector initializes IDO and AO and jumps into the 
fourth word of the function object. ApplySelector takes 23 instructions in the best case, 
and considerably more if the class of the receiver is hard to determine or if LookupMethodU 
misses in the XLATE cache. 

Either of the above handlers can suspend even before the first instruction of the function is 
executed if the function code or, in the case of ApplySelector, the receiver object is not pre- 
sent locally. Hence, it is important that a valid context be always present in ID1 and Al. In 
fact, a valid context is present in those registers as explained in the context manager section. 

Function Calls and Replies 

The control manager's other task is handling CFUT faults. There are two primary causes for 
a CFUT fault: a function accesses the result of a computation that has not finished yet, or 
any routine accesses some uninitialized variable. The control manager distinguishes these 
two cases by the data in the CFUT-tagged word that caused the fault, which is conveniently 
stored in MDP's FOPO register. 

If the data is positive, the fault was a cfuture fault, and the control manager stores that 
CFUT word in the current context's link field and suspends the context. The Optimist II 
compiler arranges for the data portion of the CFUT word to contain the offset of the context 
variable that was accessed; this way the cfuture handler does not have to disassemble the 
faulted instruction to determine the offset. The offset is needed later by the Reply handler 
to determine whether the context should be restarted. 
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If the data in the CFUT-tagged word was zero or negative, the control manager halts the 
computer because an uninitialized variable was accessed. On startup, all memory in the 
MDFs heap is cleared to CFUT:-1. 
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Figure 4-20. Function Object Format 

The function object contains the code for a function. Registers AO and IO0 point to the function while it is execut- 
ing. The third word contains the size of the message expected by the function or NIL if the size is not known or the 
function expects a variable number of arguments. The compiler initializes that word, but the operating system 
does not check it against the size of the message that invoked the function; that check would add at least five in- 
structions to the function dispatch time. 
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Figure 4-21. Reply Message Format 

The Reply message carries the reply value to the specified slot in the specified context. The context ID and reply 
slot may not be nil — if they were nil in the Apply message, no Reply message is sent. 

Functions return results to their callers via Reply messages (Figure 4-21). If a function re- 
turns multiple values, it sends one Reply message for each value returned. The Reply han- 
dler on the caller's node performs the following processing when it receives the message: 

1. The value from the message is stored over the cfuture in the caller's context. However, if 
the slot indicated in the Reply message did not originally contain a cfuture, the Reply han- 
dler halts because some function replied twice to the same slot or the compiler generated in- 
correct code. 

2. The CFUT-tagged link field in the caller's context is checked against the slot number of 
the newly updated slot. If the numbers match, the context is resumed; otherwise, the Reply 
handler exits because the context is waiting for some other event. 

Actually, for reasons of efficiency the check in (1) is done only if the slot number in (2) doesn't 
match. 

Utilities 

The operating system kernel currently contains three utilities: a divide routine, a closure 
maker, and a closure evaluator. The Divide system call divides one integer by another and 
returns the quotient and remainder using the sign conventions described in Appendix A. The 
divide routine includes considerable overhead to evaluate all signed 32-bit results correctly, 
including special cases such as dividing -$80000000 by 1 or -1 because a large-integer 
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package might be implemented on top of the normal integer arithmetic routines sometime in 
the future. 

NewClosure, the closure maker, allocates and returns a new closure object (Figure 4-22) on 
the local heap. The caller should then initialize the closure's display arguments before using 
the closure. 

CallClosure is the function called by a closure when it is invoked as a function. CallClo- 
sure calls the function specified in the closure with the additional display arguments in the 
closure. 

It is true that Divide and NewClosure could have been implemented as functions instead of 
system calls; however, these routines are used frequently enough and are short enough that 
it was decided that it would be best to make them readily available whenever they are 
needed. The additional overhead that would be required in making a function call is compa- 
rable to the time it takes to divide two numbers or allocate a new closure object. 
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Figure 4-22. Closure Format 

Closures are treated just like tunctions by Concurrent Smalltalk and the control manager. When the control man- 
ager calls a closure, it executes the instruction at offset 3, which is a CallClosure system call. That system call 
forwards the message appended with the display arguments included in the closure to the function with the ID 
specified in the word with offset 4 in the closure. 

MDP Runtime 

The MDP runtime system contains utilities for which it is not important that they reside on 
every node. Currently the MDP runtime system includes a method lookup routine and two 
routines that allocate distributed objects and are described below. 

Distributed Objects 

A distributed object is an object composed of many constituents. A message sent to the group 
name of a distributed object arrives at a constituent chosen by the operating system; the 
hope is that the operating system chooses the constituents evenly enough so as not to over- 
load some constituents and underutilize others. In addition, each constituent of a distributed 
object is itself a Concurrent Smalltalk object. 

Distributed objects are supported by the global object manager and the MDP runtime sys- 
tem. The MDP runtime system handles allocation of distributed objects, while the global ob- 
ject manager handles accessing constituents of distributed objects. 
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Implementation 



Each distributed object is implemented solely as a set of constituent objects; there is no 
"group" data for a distributed object anywhere in the system. The group name of a dis- 
tributed object contains enough information to permit quickly finding the ID of any of its 
constituents as well as a convenient way to find a nearby constituent. The structure of the 
group name is shown in Figure 4-23. 
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Figure 4-23. Distributed Object Group ID 

The group ID (DID) contains the distributed object's serial number, linear "home" node number (explained in 
Figure 4-24), and a signed base-2 logarithm of the distributed object's stride, which is the ratio S of the number of 
nodes in the J-Machine to the physical number of constituents. Both the physical number of constituents and the 
number of nodes in the J-Machine must be powers of two. The Lg(S) field is signed and 5 bits long, ranging from 
-16 (S=1/65536; 65536N constituents on an N-node J-Machine) to 15 (S=32768; 1 constituent for every 32768 
nodes) by powers of two. The linear home node number H must be less than S. The kth constituent, counting 
from k=0, is located on the node with the linear number H+LkSj. 

If the stride S is 1 or greater, each constituent object has the same serial number as the group object. If S is less 
than 1 , several constituents reside on every node in the J-Machine, and more than one serial number is required 
to distinguish them. Hence, the distributed object reserves 1/S consecutive constituent serial numbers, and the 
kth constituent has serial number N+(k mod 1/S) and resides on the node with the linear number LkS J, where N is 
the group name's serial number. H should be zero in this case. 

The linear home node number is used to distributed sparse distributed objects evenly 
throughout the J-Machine. The linear home node number is always zero for dense dis- 
tributed objects (ones with stride 1 or less). 

The physical size of a distributed object has been constrained to be a power of two for two 
reasons. First, it is desirable to be able to find any constituent from just the information con- 
tained in the DID, and encoding an arbitrary distributed object size in the DID would require 
too many bits; recording the logarithm of the size requires only five bits for any potential 
size. Second, unless some radically different addressing scheme were used, distributing the 
constituent objects evenly throughout the J-Machine would require a division operation ei- 
ther in the Co routine or in the Pref erredConstituent 1 routine. 

A variant of the current scheme has been considered in which the constituents above the 
logical size of the distributed object are not created. The Co system call would work fine in 
such a scheme (except that its range checking would no longer be valid), but the Preferred- 
Constituent routine might return a nonexistent constituent of the distributed object, and 
since it does not know the logical size of the distributed object, it would not know that the 
constituent does not exist. It could, however, inquire at the constituent's home node, at the 
expense of complicating and slowing down the implementation of distributed objects in Cos- 
mos. This variant may be adopted if the loss of memory caused by rounding the sizes of dis- 
tributed objects up to powers of two becomes too large. 

Another consequence of rounding the sizes of distributed objects up to powers of two is that 
the MDPs with high node numbers contain mostly unused constituents. This difficulty could 
be alleviated by always allocating a 11-bit random "home" node number, and adding that 
number to the node number of the constituent modulo the size of the J-Machine, at the ex- 
pense of complicating the Pref erredConstituent routine somewhat. If a J-Machine has 
more than 2048 nodes, bits could be stolen from the serial number field and added to the 
home node number field. To avoid placing too severe a restriction on the number of dis- 
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Figure 4-24. Looking up a Constituent in a Sparse Distributed Object 

This figure illustrates the co system call looking up constituent 5 in a 16-constituent distributed object on a 2048- 
node J-Machine organized as 16x16x8. The stride is 2048/16=128, so Ig(stride) is 7. Constituent is located on 
the node with the linear number $49. The distributed object's serial number is $1328. 

Since the stride is greater than 1 , the constituent number 5 is multiplied by the stride 1 28 and added to $49 to get 
constituent 5's linear node number, $2C9. The dimensions in the linear node number are packed together to 
simplify arithmetic operations; the co system call unpacks them to get the constituent's ID. 



tributed objects in the system, NewDistob j could use both the home node number and the 
serial node number fields to distinguish distributed objects. 

Locating Constituents 

The Co system call implements Concurrent Smalltalk's co primitive. To find the fcth con- 
stituent ID of a distributed object, the global object manager shifts k by Igistride) bits to the 
left and adds the linear home node number to obtain the constituent's linear node number 
and ANDs k with a right-justified mask of max(-lg(Stride),0) ones and adds it to the serial 
number from the group object to obtain the constituent's serial number (see Figures 4-24 and 
4-25). 

When a message is sent to the group name, the translation from the group name to a con- 
stituent object happens transparently in the global object manager. The PreferredCon- 
stituent system call also performs this translation. Just like any ID-to-physical-address 
translation, the object manager first checks the XLATE table. If it finds a match for the DID 
there, it immediately returns the physical address from the XLATE table. If not, it con- 
structs the ID of a nearby constituent by appending the group serial number to the local lin- 
ear node number with the lowest max(lg(JStride),0) bits replaced with the lowest bits from the 
group linear home node number. Then the resulting constituent ID is looked up in the usual 
object manager manner. If a physical address of the constituent is found, it is entered into 
the XLATE table bound with the DID to accelerate the lookup next time. 

The above algorithm deterministically maps every node in the J-Machine to exactly one con- 
stituent of the distributed object. Having such a deterministic mapping is important because 
a method running on a distributed object may reference the distributed object several times 
during its execution, and it is very important that it get the same constituent every time. 
For example, the method might be suspended while accessing fields of a constituent. When 
the method restarts and references the constituent again, it is important that it refer to the 
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Figure 4-25. Looking up a Constituent in a Dense Distributed Object 

This figure illustrates the co system call looking up constituent 25000 in a 131072-constituent distributed object on 
a 2048-node J-Machine organized as 16x16x8. The stride is 2048/131072=1/64, so Ig(stride) is -6. The home 
node number should be zero in a dense object. The distributed object has a block of 64 reserved serial numbers 
starting with $1328. 

The constituent number 25000 is multiplied by the stride 1/64 and added to to get constituent 25000's linear 
node number, $186. The constituent's serial number is determined by calculating 25000 MOD 64 and adding it to 
the base serial number. As before, the dimensions in the linear node number are unpacked to get the con- 
stituent's ID. 



same one. Since processes can't migrate across nodes, the function will, in fact, refer to the 
same constituent every time it translates the DID to a physical address. 

The above mapping will utilize the distributed object's constituents uniformly if calls to the 
distributed object come from a uniform distribution of nodes, unless the stride is less than 
one, in which case only one distributed object representative is chosen per node. If the MDPs 
were arranged in a linear array, the above mapping would always yield either the closest or 
the second-closest constituent to a given node. Since the MDPs are actually arranged in a 
two or three-dimensional mesh, the mapping will tend to cluster the constituents in lines or 
planes of the mesh, which may or may not produce favorable communication patterns. Over- 
all, though, the current mapping approach does have the advantage of simplicity, and it is 
useful for small-scale J-Machines. 

Allocating Distributed Objects 

(NewDistobj class:class size:integer) :distobj Function 

Distributed objects are allocated by calling the NewDistobj function in the MDP runtime 
system. That function first checks whether it was called on node 0; if not, it forwards its 
message to node 0, and the function is invoked there. If invoked on node 0, the function cal- 
culates the physical size of the distributed object by rounding the given logical size size to the 
nearest higher power of two. Then the stride is computed by dividing the number of MDPs in 
the J-Machine by the physical size; since the relevant numbers are all powers of two, the 
computations are done using base-2 logarithms. Maxd I stride ,1) consecutive distributed ob- 
ject serial numbers are allocated for this distributed object, and a random home node is cho- 
sen between and \stride\~l, inclusive. A global variable is used to maintain the next free 
DID number. Finally, a DID is constructed from the above information, and a NewDistob j- 
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Tree message is sent to the zeroth constituent of the distributed object (which does not exist 
yet, but the Co function can calculate its ID anyway). When that message returns, the DID 
is returned to the caller. 

(NewDistobjTree class:class size:integer iDrdistobj start, logDeltarinteger) :null 

Function 

NewDistobjTree creates constituents numbered Start through (start +2 l °9 Delta -l) of the dis- 
tributed object with the DID ID and then returns. Each constituent has group, index, and 
logical size instance variables, which are initialized to the appropriate values; Size is the logi- 
cal size. NewDistobjTree works by creating the constituent start if logDelta is zero or by re- 
cursing itself on the two halves of its range if logDelta is positive. 

The current implementation will have to be extended on a larger system so as not to bottle- 
neck node 0, but it is adequate for small and medium-range systems. 
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4.4. Summary 



The Cosmos operating system provides the software extension to the MDP architecture 
needed to run Concurrent Smalltalk programs. The operating system is comprised of a ker- 
nel resident on each MDP and a set of Concurrent Smalltalk functions written in either MDP 
assembly language or Concurrent Smalltalk. 

The operating system is built in layers which include the heap manager, BRAT manager, 
object manager, context manager, global object manager, method manager, control manager, 
utilities, and MDP and CST runtime systems. Efficiency and re-entrancy problems were re- 
curring issues in the design of the operating system kernel. The criticality system was de- 
veloped to deal with the re-entrancy and double faulting problems. In addition, many rou- 
tines are inlined in other routines to make the efficiency reasonable and avoid double faults 
and re-entrancy problems (in some cases a system call cannot call another system call but 
can use it inlined because there are no more free data registers on the MDP; global variables 
cannot be used as temporaries in routines running at criticality less than 2). 

The operating system facilities were streamlined and simplified compared with those pro- 
posed in [38]. The emphasis was on making resource allocation decisions as late as possible. 
Thus, the size of the BRAT is varied dynamically at run time instead of being fixed at operat- 
ing system compile time as in [38]. The object migration protocol has been streamlined com- 
pared with the one in [38]. The resource wait table in [38] has been eliminated entirely; the 
BRAT manager is a general-purpose mechanism that can perform the same task better. 

Finally, a scheme for quickly addressing constituents of distributed objects was designed. 
The scheme is very fast and requires only knowledge of a group ID to find either some nearby 
constituent or any given constituent. Disadvantages of the scheme include the necessity of 
rounding the size of a distributed object up to the nearest power of two and a resulting de- 
creased load on the higher-numbered MDPs in the J-Machine. Means of circumventing these 
disadvantages were explored. 
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This chapter presents the progress of a simple program through the various stages of compi- 
lation. Unfortunately, it is difficult to write a simple sample program that exercises all of the 
features of a compiler. Instead of trying to write a contrived sample program that exercised 
as many features as possible, I decided that a simpler program that exercised the major op- 
timizations would make a better example. If an illustration of a more esoteric optimization is 
desired, one can write an appropriate Concurrent Smalltalk program, compile it with Opti- 
mist II, and watch the intermediate output. 

The source program, listed in Figure 5-1, returns the sum of the integers from to n. Figure 
5-2 shows a transcript of the interactive Optimist II session in which the program was en- 
tered, tested on a few inputs, and then compiled. 

(def method average integer (b: integer) 
(// (+ self b) 2)) 

(def method average boolean (b: boolean) 
false) 

(defmethod rangesum integer (high) 
(if (- self high) 
self 

(let ( (middle (average self high) ) ) 
(+ (rangesum self middle) 

(rangesum (+ middle 1) high))))) 

(defun sum (n) 
(rangesum n) ) 

Figure 5-1. The Rangesum Program 

The sum function adds the integers from to n, inclusive. The rangesum method adds the integers from self to 
high, inclusive. The average method returns the average of two integers; the definition of average for booleans 
was included just to confuse the compiler a bit. 

CST: (+ 2 2) 

#<Integer 4> 

CST: (include) 

#<Cst-Lambda 5024988 SUM> 

CST: (aunt 0) 

KInteger 0> 

CST: (aum 1) 

KInteger 1> 

CST: (sum 2) 

KInteger 3> 

CST: (aum 10) 

KInteger 55> 

CST: (avaraga 3 5) 

KInteger 4> 

CST: (avaraga trua falsa) 

KFalse> 

CST: (aum 100) 

KInteger 5050> 

CST: (rangaaum 10 13) 

KInteger 46> 

CST: (complla aum " : :fact:Rangasum.mdp") 

Optimizing KCst-Lambda 4713968 CST::SUM> 
Expanded continuations 
Folded constants 
Forwarded replies 

Optimizing KCst-Lambda 4711636 CST: :RANGESUM> 

Collapsed nconcurrent lys 

Expanded continuations 

Specialized local types 

Deleted moves 

Deleted touches 

Folded constants 

Optimizing KCst-Lambda 4709940 CST: :AVERAGE> 
Expanded continuations 
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Specialized local types 
Deleted locals 



it. 



ed 



Back 

Sub st 

Specla 

Delet 

Delet 

Propagated 

Deleted 

Deleted 



#<Cst-Lambda 4711636 CST: :RANGESUM> 
uted inlines 
ized local types 

moves 

touches 
values 

dead definitions 

locals 



Back t<t> 

Deleted 

Inserted 

Split 

Optimi 

Inserted 

Generating code 

Assembling 
Initialized vlocs 



KCst-Lambda 4713968 CST: 
locals 
ENTER and EXIT 
statements 
ed built-ins 
ENTER and EXIT 



:SUM> 



Printing 
Assigned labels 

General: ing code 

Assembling 
Inserted branches 
Initialized vlocs 
Compacted SENDS 

Printing 
Assigned labels 
#<Cst-lambda 4713968 SUM> 

Figui-e 5-2. Rangesum Interactive Session 

The Rangesum file was read in the (include) directive, at which time the user interactively chose the file name us- 
ing a Macintosh dialog. A few functions were then tested, after which point the file was compiled. 



following sections will illustrate the actions of some of the compiler's optimizations on 
in Figure 5-1. Please refer to Chapter 3 and [21] for explanations of the trans- 



The 

the prograr|i 

formations. 



Initial Phase 



The initial 



gram, 
hcode to 



produced 
will be 



phase of the compiler first performs a few macro expansions on the input pro- 
compiles the program into hcode, and then performs some transformations on that 
it into a form that the rest of the compiler can use. Figure 5-3 shows the 
which are done by the Optimist II parser, and Figure 5-4 shows the hcode 
the parser. To save space, only the transformations on the rangesum method 
shox^n from this point on. 



get 



macroexpansions 



by 



The first 
and the 
5. The 
rently 

text ID anc 
tinuation 



Optimization Phase 

The Optimist II optimization phase performs local and global optimizations on the program. 
The order of the optimizations can be seen in the transcript in Figure 5-2; the compiler often 
interrupts the optimization of one function to optimize another because it wants to inline the 
second function in the first. 



transformation done by the optimization phase is the collapsing of nconcurrentlys 

expansion of continuations to the two-variable format, yielding the hcode in Figure 5- 

throads of the nconcurrently are inlined in the function's main body, and the nconcur- 

statement is removed. Then, since an MDP continuation is actually two words (a con- 

an offset within that context where the return value should be stored), each con- 

vkriable is replaced by two variables. 
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(defmethod rangesurn integer (high) 
(if (- self high) 
self 

(let ((middle (average self high))) 
(+ (rangesurn self middle) 

(rangesurn (+ middle 1) high))))) 

(DEFMETHOD RANGESUM INTEGER (HIGH) : I : OBJECT 
(IF (= SELF HIGH) 
SELF 
(LET ( (MIDDLE (AVERAGE SELF HIGH) ) ) 

(+ (RANGESUM SELF MIDDLE) (RANGESUM (+ MIDDLE 1) HIGH))))) 

(DEFMETHOD RANGESUM INTEGER (HIGH) :: (CONTINUATION: i :OBJECT) 
{IF (= SELF HIGH) 
SELF 
(LET ((MIDDLE (AVERAGE SELF HIGH))) 

(+ (RANGESUM SELF MIDDLE) (RANGESUM (+ MIDDLE 1) HIGH))))) 

(BEGIN 
(DEFSELECTOR RANGESUM) 
(ADD-METHOD RANGESUM INTEGER 

(METHOD-LAMBDA INTEGER (HIGH) :: (CONTINUATION: I :OBJECT) SNAME RANGESUM 
(IF (= SELF HIGH) 
SELF 
(LET ((MIDDLE (AVERAGE SELF HIGH))) 

(+ (RANGESUM SELF MIDDLE) (RANGESUM (+ MIDDLE 1) HIGH))))))) 

... (LAMBDA (SELF:INTEGER HIGH) :: (CONTINUATION :# :OBJECT) SNAME RANGESUM 
(_WITH-OBJECT (SELF: INTEGER) 
(IF (= SELF HIGH) 
SELF 
(LET ((MIDDLE (AVERAGE SELF HIGH))) 

(+ (RANGESUM SELF MIDDLE) (RANGESUM (+ MIDDLE 1) HIGH)))))) ... 

Figure 5-3. Rangesurn Macroexpansion 

The rangesurn function is first macroexpanded through two macros that add the class of the continuation to the 
defmethod syntax (see Section A.5). Then the defmethod itself is expanded into a combination of a def se- 
lector and an add-method Of a method-lambda. Later the method-lambda is expanded into a lambda. 

(LAMBDA CST: :RANGESUM 
(KParameter CST::SELF KP-Class CST: :INTEGER> 

#<Parameter CST::HIGH KS-Class CST: :OBJECT>) 

(#<Parameter CST: CONTINUATION #<Cont-Type #<S-Class CST: :OBJECT>») 

(((LOCAL 435) »<S-Class CST : : OBJECT>) 

((LOCAL 434) »<S-Class CST: :OBJECT>) 

((LOCAL 433) #<S-Class CST : : OBJECT>) 

((LOCAL 432) »<S-Class CST : : OBJECT>) 

((LOCAL CST::MIDDLE) #<S-Class CST: :OBJECT>) 

((LOCAL 431) #<S-Class CST: : OBJECT>) 

((LOCAL 430) #<S-Class CST : : OBJECT>) 

((LOCAL 429) #<S-Class CST: :OBJECT>) 

((LOCAL CST::SELF) #<P-Class CST: : INTEGER>) 

((LOCAL CST::HIGH) #<S-Class CST: :OBJECT>) 

((LOCAL CST: CONTINUATION) #<Cont-Type KS-Class CST: :OBJECT>) ) 
(ASSERT-TYPE #<P-Class CST: :INTEGER> (LOCAL CST::SELF)) 
(APPLY ((LOCAL 429) ) 

(#<Built-In-Selector CST::-> (LOCAL CST: :SELF) (LOCAL CST: :HIGH) ) ) 
(IF : FALSE (LOCAL 429) 2587) 
(MOVE (LOCAL 430) (LOCAL CST::SELF)) 
(JUMP 2611) 
(LABEL 2587) 

(APPLY ((LOCAL 431)) ( (GLOBAL CST :: AVERAGE) (LOCAL CST: : SELF) (LOCAL CST: : HIGH) ) ) 
(MOVE (LOCAL CST: :MIDDLE) (LOCAL 431)) 
(TOUCH (LOCAL CST: :MIDDLE) ) 
(NCONCURRENTLY 

(( (APPLY ((LOCAL 433)) 

(#<Built-In-Selector CST: :+> (LOCAL CST::MIDDLE) #<Integer 1>)) 
(APPLY ((LOCAL 434)) ( (GLOBAL CST: : RANGESUM) (LOCAL 433) (LOCAL CST: : HIGH) )) ) 
( (APPLY ( (LOCAL 432)) 

( (GLOBAL CST:: RANGESUM) (LOCAL CST: : SELF) (LOCAL CST: :MIDDLE) )))) ) 
(APPLY ((LOCAL 435)) (#<Built-In-Selector CST : : +> (LOCAL 432) (LOCAL 434))) 
(MOVE (LOCAL 430) (LOCAL 435)) 
(LABEL 2611) 
(MOVE (CONT-REF LOCAL CST : CONTINUATION) (LOCAL 430))) 

Figure 5-4. Initial Rangesurn Hcode 

This hcode is the final output of the initial phase. The lambda is comprised of the two parameters (self and high), 
a return (continuation), no display parameters, a list of local variables, and a representation of the hcode digraph. 

Next, the compiler starts the iterative optimizations. The first successful one is local type 
specialization, which uses type dataflow analysis to detect the fact that local 429 always 
holds a boolean value, so it changes local 429's type to boolean. 
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(LAMBDA CST::RANGESUM 
(KParameter CST::SELF KP-Class CST: : INTEGER> 

KParameter CST::HIGH #<s-class CST: :OBJECT>) 

(#<Parameter CST: : CONTINUATION KCont-Type KS-Class CST: :OBJECT>») 

(((LOCAL 435) #<S-Class CST: :OBJECT>) 

((LOCAL 434) KS-Class CST: :OBJECT>) 

((LOCAL 433) #<S-Class CST: :OBJECT>) 

((LOCAL 432) #<S-Class CST: :OBJECT>) 

((LOCAL CST::MIDDLE) #<S-Class CST: :OBJECT>) 

((LOCAL 431) #<S-Class CST: :OBJECT>) 

((LOCAL 430) #<S-Class CST: :OBJECT>) 

((LOCAL 429) KS-Class CST: :OBJECT>) 

((LOCAL CST::SELF) #<P-Class CST: :INTEGER>) 

((LOCAL CST::HIGH) KS-Class CST: :OBJECT>) 

((LOCAL CST: CONTINUATION) »<Cont-Type #<S-Class CST: :OBJECT>) 

((LOCAL CST: CONTINUATION) KP-Class CST: :CONTEXT>) 

((LOCAL CST: CONTINUATION) KDisp-Type f<S-Class CST: :OBJECT>) ) 
(ASSERT-TYPE #<P-Class CST: : INTEGER> (LOCAL CST::SELF)) 
(APPLY ((LOCAL 429)) 

(KBuilt-In-Selector CST::=> (LOCAL CST: : SELF) (LOCAL CST: : HIGH) ) ) 
(IF :FALSE (LOCAL 429) 2587) 
(MOVE (LOCAL 430) (LOCAL CST::SELF)) 
(JUMP 2611) 
(LABEL 2587) 

(APPLY ((LOCAL 431)) (#<Selector CST: : AVERAGE> (LOCAL CST: :SELF) (LOCAL CST: :HIGH) ) ) 
(MOVE (LOCAL CST::MIDDLE) (LOCAL 431)) 
(TOUCH (LOCAL CST: -.MIDDLE)) 

(APPLY ((LOCAL 433)) (#<Bullt-In-Selector CST: :+> (LOCAL CST: :MIDDLE) Klnteger 1>) ) 
(APPLY ((LOCAL 434)) (#<Selector CST: : RANGESUM> (LOCAL 433) (LOCAL CST::HIGH))) 
(APPLY ((LOCAL 432)) (#<Selector CST: : RANGESUM> (LOCAL CST: : SELF) (LOCAL CST: :MIDDLE) ) ) 
(APPLY ((LOCAL 435)) (#<Built-In-Selector CST::+> (LOCAL 432) (LOCAL 434))) 
(MOVE (LOCAL 430) (LOCAL 435)) 
(LABEL 2611) 
(MOVE (CONT-REF (LOCAL CST: : CONTINUATION) (LOCAL CST: : CONTINUATION) ) (LOCAL 430))) 

Figure 5-5. Hcode after Initial Transformations 

The nconcurrently statement has been broken into its threads, and two variables assigned to hold the continuation. 
The two new continuation variables have the same name as the single old continuation variable, which is still pre- 
sent, but the compiler does not get confused over variable name conflicts. 

(LAMBDA CST: : RANGESUM 
(KParameter CST::SELF KP-Class CST: : INTEGER> 

KParameter CST::HIGH KS-Class CST: :OBJECT>) 

(KParameter CST: CONTINUATION KCont-Type KS-Class CST: :OBJECT»>) 

(((LOCAL 435) KS-Class CST: :OBJECT>) 

((LOCAL 434) KS-Class CST: :OBJECT>) 

((LOCAL 433) KS-Class CST: :OBJECT>) 

((LOCAL 432) KS-Class CST: :OBJECT>) 

((LOCAL CST::MIDDLE) KS-Class CST: :OBJECT>) 

((LOCAL 431) KS-Class CST: :OBJECT>) 

((LOCAL 430) KS-Class CST: :OBJECT>) 

((LOCAL 429) KP-Class CST: :BOOLEAN>) 

((LOCAL CST::SELF) KS-Class CST: :OBJECT>) 

((LOCAL CST::HIGH) KS-Class CST: :OBJECT>) 

((LOCAL CST: CONTINUATION) KCont-Type KS-Class CST: :OBJECT>) 

((LOCAL CST: CONTINUATION) KP-Class CST: :CONTEXT>) 

((LOCAL CST: CONTINUATION) KDisp-Type KS-Class CST: :OBJECT>) ) 
(ASSERT-TYPE KP-Class CST: : INTEGER> (LOCAL 435)) 

(APPLY ((LOCAL 429)) (KBuilt-In-Selector CST: :=> (LOCAL 435) (LOCAL CST: :HIGH) ) ) 
(IF :FALSE (LOCAL 429) 2587) 
(JUMP 2611) 
(LABEL 2587) 

(APPLY ((LOCAL 431)) (KSelector CST : : AVERAGE> (LOCAL 435) (LOCAL CST: :HIGH) ) ) 
(APPLY ((LOCAL 433)) (KBuilt-In-Selector CST::+> (LOCAL 431) Klnteger 1>) ) 
(APPLY ((LOCAL 434)) (KSelector CST: : RANGESUM> (LOCAL 433) (LOCAL CST::HIGH))) 
(APPLY ((LOCAL 432)) (KSelector CST: :RANGESUM> (LOCAL 435) (LOCAL 431))) 
(APPLY ((LOCAL 435)) (KBuilt-In-Selector CST:: + > (LOCAL 432) (LOCAL 434))) 
(LABEL 2611) 
(MOVE (CONT-REF (LOCAL CST : CONTINUATION) (LOCAL CST :: CONTINUATION) ) (LOCAL 435) ) ) 

Figure 5-6. Locally Optimized Hcode 

This hcode has been fully optimized using the optimizations in the original Optimist compiler. Note that due to 
move elimination the self parameter is no longer stored in the old self local; instead, a new local numbered 435 is 
now used to hold the self value. 

Afterwards, the standard dataflow optimizations described in [21] remove a few moves and a 
touch to yield the hcode in Figure 5-6. Then the constant folder realizes through type infer- 
ence that only one possible method of the rangesum and average selectors could be called, so 
it replaces the method calls with direct function calls (Figure 5-7). 
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(LAMBDA CST: :RANGESUM 
(#<Parameter CST::SELF KP-Class CST: :INTEGER> 
KParameter CST: :HIGH #<S-Class CST: :OBJECT>) 
(KParameter CST: CONTINUATION #<Cont-Type KS-Class CST: :OBJECT>») 



(((LOCAL 435) KS-Class CST: :OBJECT>) 

((LOCAL 434) #<S-Class CST: :OBJECT>) 

((LOCAL 433) #<S-Class CST : : OBJECT>) 

((LOCAL 432) KS-Class CST: :OBJECT>) 

((LOCAL CST::MIDDLE) KS-Class CST: :OBJECT>) 

((LOCAL 431) #<S-Class CST: :OBJECT>) 

((LOCAL 430) KS-Class CST: : OBJECT>) 

((LOCAL 429) #<P-Class CST: :BOOLEAN>) 

((LOCAL CST::SELF) #<S-Class CST: :OBJECT>) 

((LOCAL CST::HIGH) #<S-Class CST: :OBJECT>) 

((LOCAL CST: CONTINUATION) KCont-Type #<S-Class CST: :OBJECT>) 

((LOCAL CST: CONTINUATION) #<P-Class CST : : CONTEXT>) 

((LOCAL CST: CONTINUATION) KDlsp-Type #<S-Class CST: :OBJECT>) ) 
(ASSERT-TYPE #<P-Class CST: : INTEGER> (LOCAL 435)) 

(APPLY ((LOCAL 429)) (#<Bullt-In-Selector CST: : -> (LOCAL 435) (LOCAL CST: :HIGH) ) ) 
(IF :FALSE (LOCAL 429) 2587) 
(JUMP 2611) 
(LABEL 2587) 

(APPLY ((LOCAL 431)) ( (LAMBDA CST: : AVERAGE) (LOCAL 435) (LOCAL CST: :HIGH) ) ) 
(APPLY ((LOCAL 433)) (KBuilt-In-Selector CST::+> (LOCAL 431) KInteger 1>) ) 
(APPLY ((LOCAL 434)) ((LAMBDA CST: : RANGESUM) (LOCAL 433) (LOCAL CST::HIGH))) 
(APPLY ((LOCAL 432)) ( (LAMBDA CST: : RANGESUM) (LOCAL 435) (LOCAL 431))) 
(APPLY ((LOCAL 435)) (#<Bullt-In-Selector CST::+> (LOCAL 432) (LOCAL 434))) 
(LABEL 2611) 
(MOVE (CONT-REF (LOCAL CST: CONTINUATION) (LOCAL CST: CONTINUATION) ) (LOCAL 435))) 

Figure 5-7. Hcode after Global Constant Propagation 

The constant propagator found that the average and rangesum method calls would always invoke the same meth- 
ods, so it replaced them with function calls. 

(LAMBDA CST::AVERAGE 
(KParameter CST::SELF KP-Class CST: :INTEGER> 

KParameter CST: :B KP-Class CST: : INTEGER>) 

(KParameter CST: CONTINUATION KCont-Type KS-Class CST: :OBJECT>») 

(((LOCAL 424) KP-Class CST: : INTEGER>) 

((LOCAL 423) KP-Class CST: : INTEGER>) 

((LOCAL CST:: SELF) KP-Class CST : : INTEGER>) 

((LOCAL CST::B) KP-Class CST: : INTEGER>) 

((LOCAL CST: CONTINUATION) KP-Class CST: CONTEXT>) 

((LOCAL CST: CONTINUATION) KDisp-Type KS-Class CST: :OBJECT>) ) 
(ASSERT-TYPE KP-Class CST: : INTEGER> (LOCAL CST::SELF)) 

(APPLY ((LOCAL 423)) (KBuilt-In-Selector CST : : +> (LOCAL CST: :SELF) (LOCAL CST: :B) ) ) 
(APPLY ((LOCAL 424)) (KBui lt-In-Selector CST:://> (LOCAL 423) KInteger 2>> ) 
(MOVE (CONT-REF (LOCAL CST: CONTINUATION) (LOCAL CST: CONTINUATION) ) (LOCAL 424))) 

Figure 5-8. Optimized Average Hcode 

The average method for integers has been optimized in an attempt to inline it inside rangesum.. 

Next, the optimizer attempts to inline the average and rangesum functions. Due to the an- 
tirecursion restrictions, it cannot inline rangesum inside itself, but it is more successful with 
average. In order to inline average, it first optimizes it, yielding the hcode in Figure 5-8. 
Then it checks that the inlining heuristics are satisfied — they are because the optimized av- 
erage contains only two primitive calls. Average does not perform any computation after it 
replies, so all of the requirements for inlining have been satisfied. Therefore, the optimizer 
inlines average inside rangesum to produce the hcode in Figure 5-9, which is optimized to the 
hcode in Figure 5-10 at end of the general optimizations. 
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(LAMBDA CST:: RANGESUM 
(KParameter CST::SELF KP-Class CST: :INTEGER> 
KParameter CST::HIGH KS-Class CST: :OBJECT>) 
(KParameter CST: :CONTINUATION #<Cont-Type KS-Class CST: :OBJECT»>) 

(((LOCAL 435) #<S-Class CST: :OBJECT>) 
< ((LOCAL 434) KS-Class CST: :OBJECT>) 

((LOCAL 433) #<S-Class CST: :OBJECT>) 

((LOCAL 432) #<S-Class CST: :OBJECT>) 

((LOCAL 431) #<S-Class CST: :OBJECT>) 

((LOCAL 429) #<P-Class CST: :BOOLEAN>) 

((LOCAL CST:: HIGH) #<S-Class CST: :OBJECT>) 

((LOCAL CST: CONTINUATION) #<P-Class CST: :CONTEXT>) 

((LOCAL CST: CONTINUATION) #<Disp-Type #<S-Class CST: :OBJECT>) 

((LOCAL 424) KP-Class CST: : INTEGER>) 

((LOCAL 423) KP-Class CST: : INTEGER>) 

((LOCAL CST::SELF) #<P-Class CST: :INTEGER>) 

((LOCAL CST::B) #<P-Class CST: :INTEGER>) 

((LOCAL 455) #<S-Class CST: :OBJECT>) ) 
(ASSERT-TYPE #<P-Class CST: : INTEGER> (LOCAL 435)) 

(APPLY ((LOCAL 429)) (KBuilt-In-Selector CST: :=> (LOCAL 435) (LOCAL CST: :HIGH) ) ) 
(IF :FALSE (LOCAL 429) 2979) 
(JUMP 2611) 
(LABEL 2979) 

(MOVE (LOCAL CST: :B) (LOCAL CST: :HIGH) ) 
(MOVE (LOCAL CST::SELF) (LOCAL 435)) 
(TOUCH (LOCAL CST::B)) 
(TOUCH (LOCAL CST::SELF)) 

(ASSERT-TYPE KP-Class CST : : INTEGER> (LOCAL CST::SELF)) 

(APPLY ((LOCAL 423)) ( «<Built-In-Selector CST: : +> (LOCAL CST: : SELF) (LOCAL CST: : B) ) ) 
(APPLY ((LOCAL 424)) (#<Built-In-Selector CST:://> (LOCAL 423) KInteger 2>) ) 
(MOVE (LOCAL 4 55) (LOCAL 4 24)) 
(MOVE (LOCAL 431) (LOCAL 455!) 

(APPLY ((LOCAL 433)) (#<Built-In-Selector CST::+> (LOCAL 431) KInteger 1>)) 
(APPLY ((LOCAL 434)) ( (LAMBDA CST: : RANGESUM) (LOCAL 433) (LOCAL CST: :HIGH) ) ) 
(APPLY ((LOCAL 432)) ((LAMBDA CST: : RANGESUM) (LOCAL 435) (LOCAL 431))) 
(APPLY ((LOCAL 435)) ( KBuil t-In-Selector CST: :+> (LOCAL 432) (LOCAL 434))) 
(LABEL 2 611) 
(MOVE (CONT-REF (LOCAL CST :: CONTINUATION) (LOCAL CST :: CONTINUATION) ) (LOCAL 435))) 



Figure 5-9. Rangesum with Average Inlined 

The integer average method has just been inlined into rangesum. 



(LAMBDA CST: : RANGESUM 
((KParameter CST::SELF KP-Class CST: : INTEGER> 
(KParameter CST::HIGH KS-Class CST: :OBJECT>) 
(KParameter CST: CONTINUATION KCont-Type KS-Class CST: :OBJECT»>) 

(((LOCAL 435) #<S-Class CST: :OBJECT>) 

((LOCAL 434) #<S-CIass CST: :OBJECT>) 

((LOCAL 433) KP-Class CST: : INTEGER:-) 

((LOCAL 432) KS-Class CST: :OBJECT>) 

((LOCAL 429) KP-Class CST: :BOOLEAN>) 

((LOCAL CST:: HIGH) KS-Class CST: :OBJECT>) 

((LOCAL CST: CONTINUATION) KP-Class CST: :CONTEXT>) 

((LOCAL CST: CONTINUATION) KDisp-Type KS-Class CST: :OBJECT>) 

((LOCAL 424) KP-Class CST: : INTEGER>) 

((LOCAL 423) KP-Class CST: : INTEGER>) ) 
(ASSERT-TYPE KP-Class CST: : INTEGER> (LOCAL 435)) 

(APPLY ((LOCAL 429)) (KBuil t-In-Selector CST: :-> (LOCAL 435) (LOCAL CST: :HIGH) ) ) 
(IF :FALSE (LOCAL 429) 2541) 
(JUMP 2611) 
(LABEL 2541) 

(ASSERT-TYPE KP-Class CST: : INTEGER> (LOCAL 435)) 

(APPLY ((LOCAL 423)) (KBullt-In-Selector CST: : + > (LOCAL 435) (LOCAL CST :: HIGH) ) ) 
(APPLY ((LOCAL 424)) (KBuilt-In-Selector CST: ://> (LOCAL 423) KInteger 2>) ) 
(APPLY ((LOCAL 433)) (KBuilt-In-Selector CST: : + > (LOCAL 424) KInteger 1>)) 
(APPLY ((LOCAL 434)) ((LAMBDA CST: : RANGESUM) (LOCAL 433) (LOCAL CST::HIGH))) 
(APPLY ((LOCAL 432)) ((LAMBDA CST: : RANGESUM) (LOCAL 435) (LOCAL 424))) 
(APPLY ((LOCAL 435)) ( KBuilt-In-Selector CST::+> (LOCAL 432) (LOCAL 434))) 
(LABEL 2611) 
(MOVE (CONT-REF (LOCAL CST :: CONTINUATION) (LOCAL CST: : CONTINUATION) ) (LOCAL 435))) 

Figure 5-10. Rangesum after General Optimizations 

The rangesum hcode is now at the "Optimized Hcode" stage in Figure 3-4. 

The MDP-specific optimizations remove the assert-type hcode, reduce the division to a shift, 
and insert enter and exit hcodes to yield the final hcode in Figure 5-11. 
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(LAMBDA CST: :RANGESUM 
(KParameter CST: :SELF #<P-Class CST: :INTEGER> 

#<Parameter CST::HIGH #<S-Class CST: :OBJECT>) 

(KParameter CST: CONTINUATION »<Cont-Type KS-Class CST: :OBJECT»>) 

(((LOCAL 435) #<S-Class CST: :OBJECT>) 

((LOCAL 434) #<S-Class CST: :OBJECT>) 

((LOCAL 433) KP-Class CST: : INTEGER>) 

((LOCAL 432) #<S-Class CST: :OBJECT>) 

((LOCAL 429) #<P-Class CST: : BOOLEAN>) 

((LOCAL CST::HIGH) KS-Class CST: :OBJECT>) 

((LOCAL CST: CONTINUATION) f<P-Class CST: :CONTEXT>) 

((LOCAL CST: CONTINUATION) »<Dlsp-Type #<S-Class CST: :OBJECT>) 

((LOCAL 424) #<P-Class CST: : INTEGER>) 

((LOCAL 423) #<P-Class CST: : INTEGER>) ) 
(ENTER) 

(APPLY ((LOCAL 429)) (KBui lt-In-Selector CST :: => (LOCAL 435) (LOCAL CST: :HIGH) ) ) 
(IF :FALSE (LOCAL 429) 2547) 
(JUMP 2611) 
(LABEL 2547) 

(APPLY ((LOCAL 423)) (#<Built-In-Selector CST: : +> (LOCAL 435) (LOCAL CST: : HIGH) ) ) 
(APPLY ((LOCAL 424)) (KBui lt-In-Selector CST::ASH> (LOCAL 423) Klnteger -1>)) 
(APPLY ((LOCAL 433)) (#<Bullt-In-Selector CST::+> (LOCAL 424) Klnteger 1>) ) 
(APPLY ((LOCAL 434)) ((LAMBDA CST : : RANGESUM) (LOCAL 433) (LOCAL CST::HIGH))) 
(APPLY ((LOCAL 432)) ((LAMBDA CST: : RANGESUM) (LOCAL 435) (LOCAL 424))) 
(APPLY ((LOCAL 435)) (KBui lt-In-Selector CST: : +> (LOCAL 432) (LOCAL 434))) 
(LABEL 2611) 

(MOVE (CONT-REF (LOCAL CST: CONTINUATION) (LOCAL CST: CONTINUATION) ) (LOCAL 435) ) 
(EXIT)) 

Figure 5-11. Final Hcode 

This is the final hcode produced before it is compiled into MDP assembly language. 

Compilation Phase 

The compilation phase compiles each hcode in Figure 5-11 into MDP assembly instructions 
and then peephole-optimizes and emits the resulting code to produce the MDPSim file in 
Figure 5-12. There is no need to describe the transformations here, as an appropriate exam- 
ple is in [21]. 

The definitions of the label numbers in Figure 5-12 contain expressions of the form LABEL 
cObject=(5&mX)«sX| (5&mY)«sY| (5&mZ)«sZ| (5&m3)«s3| (5&m4)«s4| (5&m5)« 
s5. This expression means that cObject is class with serial number 5. Nevertheless, since 
objects should be distributed throughout the J-Machine, the bits in the class serial number 5 
are permuted to map the low-order bits onto the bits denoting the x, y, and z network coordi- 
nates of an object. This is done by the first half of the expression, 
(5&mX)«sX| (5&mY) «sY| (5&mZ) «sZ; mX, raY, mZ, sX, sY, and sZ are constants defined 
by the operating system and depend on the dimensions of the J-Machine. The second half of 
the expression, (5&m3)«s3| (5&m4)«s4| (5&m5) «s5, maps the rest of the class serial 
number bits onto the remaining bits. A similar expression, REF REV fSum=ID: (- 
2&mX)«sX| (-2&mY)«sY| (-2&mZ)«sZ| (-2&mS) «sS, is used to map objects onto nodes. 

LABEL cObject-(5smX)«sX| (5smY)<<sY| (5smZ) «sZ I (5sm3)«s3| (5sm4)«s4| (5£m5)<<s5 

LABEL cClass-(8smX)«sX| (8SmY)«sY| (8smZ)<<sZ| (8sm3) «s3 I (8sm4)<<s4| (8sm5) «s5 

LABEL cStandard_Class= (3SmX) «sX| (3SmY)«sY) (3SmZ)«sZ| (3sm3)«s3| (3sm4)«s4| (3sra5)«s5 

LABEL cPrimitive_Class=(2SmX)«sX| (2smY)«sY| (2smZ)<<sZ| (2sm3)«s3| (2im4)<<s4| (2«m5) «s5 

LABEL cDistributed_Class=(4«mX)<<sX| (4smY)«sY| (4smZ)<<sZ| (4sm3)«s3| (4Sm4)«s4| (4sm5)«s5 

LABEL cSytnbol=(7smX)«sX| (7smY)<<sYI (7tmZ)«sZ| (7sm3)<<s3| (7sm4)«s4| (7Sm5)«s5 

LABEL cNull=(6«mX) «sX I (6smY)«sY| (6smZ)«sZ| (6sm3) <<s3 I (6sm4) «s4 I (6Sm5)<<s5 

LABEL cFunct=(17smX) «sX| (17smY)«sY| (17smZ)«sZ| (17sm3)«s3| (17sm4)«s4| (17sm5) «s5 

LABEL cSelector=(9smX)«sX| (9SmY)<<sY| (9SmZ) «sZ I (9Sm3)<<s3| (9£m4) «s4 | (9sm5)«s5 

LABEL cMagnltude-(18smX)«sX| (18SmY) «sY I (18smZ)«sZ| (18sm3)«s3| (18«m4) «s4 I (18sm5)«s5 

LABEL cCharacter-(lOsmX) «sX| (10smY)«sY| (104mZ)«sZ| (10sm3)«s3| (10sm4)«s4| (10£mS)«s5 

LABEL cNumber=(19smX)<<sX| (19smY)«sY| (19smZ)<<sZ| (19sm3) <<s3 I (19sm4)<<s4| (19«m5)«s5 

LABEL cReal=(20smX)«sX| (20smY)«sY| (20smZ)«sZ| (20sm3)<<s3| (20Sm4)«s4l (20sm5)«s5 

LABEL cInteger=(HSmX)«sX| (HSmY)«sY| (llimZ)«sZ| (Ilsm3)«s3| (114m4)«s4| (Il«m5)«s5 

LABEL cBoolean=(12smX)«sX| (12smY)«sY| (12smZ)«sZ| (12sm3)«s3| (12tm4)«s4| (12tm5)«s5 

LABEL cFalse=(13smX)«sX| (13imY)«sY| (13smZ) «sZ I (13sm3) <<s3 I (13sm4) «s4 I (13ira5)«s5 

LABEL cTrue-(14smX)«sX| (144mY)«sY| (14SmZ)«sZ| (14sm3)<<s3| (14Sm4) <<s4 I (14sm5)«s5 

LABEL cFloat-(15smX)«sX| (15imY)«sY| (15SmZ)«sZ| (15sm3)«s3| (15sm4)«s4| (15sm5)«s5 

LABEL cFunction-(16imX)<<sX| (16smY)<<sYI (16imZ)<<sZ| (16Sm3)<<s3| (16sm4)<<s4l (16Sm5)<<s5 

LABEL c_Closure«(21SmX)<<sX| (21smY)<<sY| (21SmZ)<<sZ| (21sm3) <<s3 I (2Um4)<<s4| (21Sm5)«s5 

LABEL cContext=(22smX)«sX| (22smY)«sY| (22smZ)«sZ| (22sm3)«s3| (22£m4)«s4| (22sm5)«s5 

LABEL cDisplacement=(23smX)<<sX| (234mY)<<sY| (23smZ) <<sZ | (23sm3)<<s3| (23Sm4)«s4| (23Sm5)«s5 

LABEL cContlnuatlon-(24SraX)<<sX| (24&mY) <<sY| (24SmZ) <<sZ | (24Sm3)<<s3| (24sm4)«s4| (24Sm5)«s5 

LABEL cGlobal=(25smX)<<sX| (25«mY) <<sY I (25smZ)«sZI (25£m3) <<s3 I (25sm4)<<s4| (25sm5)«s5 

LABEL cDlstobj-(26smX) «sX I (26SmY) «sY | (26smZ)«sZ| (26Sm3) «s3 I (26sm4) «s4 I (26sm5)«s5 
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REF REV selPLUS=TAGO:subSEL«subtagN| <0smX)«sX| <0smY)«sY| (04mZ)«sZ| <0sm3)«s3| (0sm4) «s4 I (0«m5)«s5 

REF REV selEQUAL-TAGO:subSEL«subtagN| (UmX)«sX| (UmY)<<sY| (limZ)«sZ| (Um3)«s3| (Ism4)«s4| (l£m5)«s5 

REF REV selAsh=TAGO:subSEL«subtagN| (2SmX)«sXI <2smY)«sY| (2SmZ)<<sZ| (2sm3) «s3 I (2«m4)«s4| (2sm5) «s5 

REF REV fRangesum=ID: (-lSmX)«sX| (-lSmY)«sY| (-UmZ)«sZ| (-14mS)«sS 

REF REV fSum=*ID: (-2smX)«sXI (-2«mY)«sY| (-2smZ)«sZ| (-24mS) «sS 



MODULE cObject 



DC 
DC 
DC 
DC 
DC 
END 



MSG : hdrCopyable I cStandard_Class«of f setN I 5 

TAG0:subCLASS«subtagN|cObject 

MSG:cObject«offsetN|2 

1 

TAGO : subCLASS«subtagN I cOb ject 



MODULE cClass 



DC 
DC 
DC 
DC 
DC 
DC 
END 



MSG:hdrCopyable|cPrimitive_Class<<offsetN|6 

TAGO : subCLASS«subtagN I cClass 

NIL 

2 

TAGO:subCLASS«subtagN| cClass 

TAGO : subCLASS«subtagN I cOb ject 



MODULE cStandard_Class 

DC MSG:hdrCopyable|cPrimitive_Class«offsetN|7 

DC TAGO:subCLASS«subtagN| cStandard_Class 

DC NIL 

DC 3 

DC TAGO:subCLASS«subtagN| cStandard_Class 

DC TAGO :subCLASS«subtagN I cClass 

DC TAGO:subCLASS«subtagNI cObject 

END 

. MODULES for the rest of the classes deleted ... 



MODULE selPLUS 

DC MSG:hdrCopyable|cSelector«offsetN|3 

DC (selPLUS) 

DC 

END 

MODULE selEQUAL 

DC MSG : hdrCopyable | cSelector«o£f setN I 3 

DC (selEQUAL) 

DC 

END 

MODULE selAsh 

DC MSG: hdrCopyable | cSelector«off setN I 3 

DC (selAshl 

DC 

END 



MODULE 


fRangesum 




DC 


MSG : hdrCopyable I cFunct lon«of f setN I 2 


DC 


{ fRangesum) 




DC 


6 




MOVE 


[2,A3),R0 


3 


MOVE 


[2, A3) ,R3 


3.5 


EQUAL 


R3, [3,A3],R1 


4 


BT 


R1,*L001 


4.5 


ADD 


R3, [3,A3],R1 


5 


ASH 


R1,-1,R3 


5.5 


ADD 


R3,1,R2 


6 


MOVE 


R2,R0 


6.5 


CALL 


objectNode 


7 


DC 


MSG:msgApplyFunction| 6 


8 


SEND20 


R1,R0 


9 


DC 


( fRangesum) 


10 


SEND20 


R0,R2 


11 


SENDO 


[3, A3] 


11.5 


MOVE 


6,R0 


12 


SEND2E0 


[1,A1],R0 


12.5 


WTAG 


RO, 6,R0 


13 


MOVE 


RO, [6, Al] 


13.5 


MOVE 


[2,A3),R0 


14 


CALL 


objectNode 


14.5 


DC 


MSGrmsgApplyFunct ion I 6 


15 


SEND20 


R1,R0 


16 


DC 


( fRangesum) 


17 


SENDO 


RO 


• 18 


SEND20 


[2,A3),R3 


18.5 


MOVE 


7,R0 


• 19 


SEND2E0 


[1,A1],R0 


• 19.5 


WTAG 


RO, 6, RO 


• 20 


MOVE 


RO, [7,A1] 


• 20.5 


MOVE 


[7,A1],R2 


• 21 


ADD 


R2, [6,A1],R1 


• 21.5 


MOVE 


Rl, [2, A3) 


• 22 
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Sample Program 



L001: MOVE [4,A3),R2 


23 


BNIL R2,*L002 


23.5 


DC MSG:msgReply 1 4 


24 


SEND20 R2,R0 


25 


SENDO R2 


25.5 


SENDO [5, A3) 


26 


SENDEO [2, A3] 


26.5 


L002: SUSPEND 


27 


END 




MODULE fSum 




DC MSG:hdrCopyable|cFunction«oifsetN|10 


DC (fSum) 




DC 5 




MOVE 0,R0 


3 


CALL objectNode 


3.5 


DC MSG:msgApplyFunction | 6 


4 


SEND20 R1,R0 


5 


DC (fRangesum) 


6 


SENDO RO 


7 


SENDO 


7.5 


SENDO [2, A3] 


8 


SENDO [3, A3] 


8.5 


SENDEO [4, A3] 


9 


SUSPEND 


9.5 


END 




DOWNLOAD cObject 




DOWNLOAD cClass 




DOWNLOAD cStandard Class 




DOWNLOAD cPrimitive Class 




DOWNLOAD cDistributed Class 




DOWNLOAD cSymbol 




DOWNLOAD cNull 




DOWNLOAD cFunct 




DOWNLOAD cSelector 




DOWNLOAD cMagnitude 




DOWNLOAD cCharacter 




DOWNLOAD cNumber 




DOWNLOAD cReal 




DOWNLOAD clnteger 




DOWNLOAD cBoolean 




DOWNLOAD cFalse 




DOWNLOAD cTrue 




DOWNLOAD cFloat 




DOWNLOAD cFunctlon 




DOWNLOAD c Closure 




DOWNLOAD cContext 




DOWNLOAD cDlsplacement 




DOWNLOAD cContlnuatlon 




DOWNLOAD cGlobal 




DOWNLOAD cDlstobj 




DOWNLOAD selPLUS 




DOWNLOAD sel EQUAL 




DOWNLOAD selAsh 




DOWNLOAD fRangesum 




DOWNLOAD I Sum 





RUN 

Figure 5-12. MDPSim Output File 

Except for Cosmos, this file contains all code and data necessary to run sum on a J-Machine. The file starts with 
class number definitions, which are followed by definitions of the classes themselves, including the class hierarchy. 
The selectors are defined next, followed by code and MDPSim statements that download all of the code, selector, 
and class modules to the simulated J-Machine. The RUN command runs the J-Machine until all modules have 
been loaded. 

Only the functions and selectors necessary to run the program have been compiled. For example, neither average 
method has been included because, after optimization, neither is necessary to run sum. Similarly, all method dis- 
patches have been optimized out, so there is no need to include the definition of the rangesum selector. 

Running Rangesum 

Before rangesum can be run on MDPSim, a file holding the calls that will be done needs to be 
defined; the file that was used is shown in Figure 5-13. Each MESSAGE directive defines an 
ApplyFunction message that can be used to call the sum function. The argument is the 
third word of the message, while the fourth and fifth words contain a magic continuation that 
cause the Reply message to be printed by MDPSim in the listener window. The MESSAGE 
definitions can also be entered into MDPSim manually. 
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Once the calls file is written, MDPSim can be started and used to run sum on a sample input. 
An example session is shown in Figure 5-14, in which the input 10 is tried on sum, and the 
statistics observed. The results will be discussed in more detail in Chapter 7. 

MESSAGE suml 

MSG:msgApplyFunction | 5 

{fSum} 

1 

IONODE 



END 

MESSAGE sumlO 

MSG:msgApplyFunction | 5 

{fSum} 

10 

IONODE 



END 

MESSAGE sum50 

MSG:msgApplyFunction | 5 

{fSum} 

50 

IONODE 



END 

Figure 5-13. Rangesum Call File 

Three messages have been defined for calling the sum function with the arguments 1,10, and 50. ionode is an 
integer constant predefined by MDPSim and denotes the address of the MDP serving as the I/O node between the 
J-Machine and the outside world. In MDPSim, the I/O node simply prints every message it receives. 



MDPSim -x 2 -y 2 -msiz* 0x1000 :: Cosmos: Cosmos. m RangaSum.mdp RangaSum. calls 

Message-Driven Processor Simulator 

Version 7.0 Rev B 

Accompanies MDP Architecture Document 11B 

Written by Waldemar Horwat 

Architecture Updates by Brian Totty and Jerry Larivee 

UROPs for Bill Dally 

4 MDPs present. 



80..3)watch 


fault all 


80. .3)r«««tstats 


80..3)inj«ct suml 081 


@0. .3)mn 






Fault: 


e 


1 


(faultXlateO) 


Fault: 


a 


1 


{BBBW} $008B = 


Fault: 


8 


1 


(lookupBinding) 


Fault: 


8 


1 


(BBBW) $0OC6 - 


Fault: 


8 


1 


(enterBinding) 


Fault: 


8 


1 


(BBBW) S00C5 - 


Fault: 


8 


2 


(blockSend) 


Fault: 


8 


2 


(BBBW) $00C2 = 


Fault: 


8 


2 


(faultLimitO) 


Fault: 


8 


2 


(BBBW) $0088 = 


Fault: 


8 


1 


(allocObject) 


Fault: 


8 


1 


(BBBW) $00C4 - 


Fault: 


8 


1 


(lookupBinding) 


Fault: 


8 


1 


(BBBW) $O0C6 - 


Fault: 


8 


1 


(blockMove) 


Fault: 


8 


1 


(BBBW) $00C1 = 


Fault: 


8 


1 


(faultLimitO) 


Fault: 


8 


1 


(BBBW) $0088 - 


Fault: 


8 


1 


(object Node) 


Fault: 


8 


1 


(BBBW) S00D3 - 


Fault: 


8 


2 


(faultXlateO) 


Fault: 


8 


2 


(BBBW) $008B - 


Fault: 


8 


2 


(lookupBinding) 


Fault: 


8 


2 


(BBBW) S00C6 - 


Fault: 


8 


2 


(enterBinding) 


Fault: 


8 


2 


(BBBW) S00C5 - 


Fault: 


8 


3 


(blockSend) 


Fault: 


8 


3 


(BBBW) $00C2 = 


Fault: 


8 


3 


(faultLimitO) 



DC 


fltXLATE 


;XLATE 


DC 


fltLookupBinding 


;$06 


DC 


fltEnterBinding 


;$05 


DC 


fltBlockSend 


;$02 


DC 


fltLimit 


; LIMIT 


DC 


fltAllocObject 


;$04 


DC 


fltLookupBinding 


;$0 6 


DC 


fltBlockMove 


;$01 


DC 


fltLimit 


; LIMIT 


DC 


fltObjectNode 


;$13 


DC 


fltXLATE 


;XLATE 


DC 


fltLookupBinding 


;$0 6 


DC 


fltEnterBinding 


;$0 5 


DC 


fltBlockSend 


;$0 2 
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Sample Program 



Fault: 


e 


3: 


{ BBBW ) 


$0088 ■= 


Fault: 


e 


2: 




allocObject) 


Fault: 


8 


2: 


{ BBBW ] 


$00C4 - 


Fault: 


8 


2: 




lookupBinding) 


Fault: 


e 


2: 


( BBBW ] 


$00C6 - 


Fault: 


e 


2: 




blockMove) 


Fault: 


e 


2: 


( BBBW ] 


S00C1 = 


Fault: 


8 


2: 




faultLimitO) 


Fault: 


8 


2: 


{BBBW] 


$0088 - 


Fault: 


8 


2: 




faultXlateO) 


Fault: 


8 


2: 


{ BBBW ] 


S008B = 


Fault: 


8 


2: 




lookupBindlng) 


Fault: 


8 


2: 


{ BBBW ] 


S00C6 - 


Fault: 


8 


2: 




faultXlateO) 


Fault: 


e 


2: 


( BBBW 


$008B = 


Fault: 


8 


2: 




objectNode) 


Fault. 


8 


2: 


( BBBW ; 


S00D3 - 


Fault: 


8 


1: 




faultXlateO) 


Fault. 


8 


1: 


{ BBBW ; 


S008B - 


Fault 


8 


2- 




objectNode) 


Fault 


8 


2 


{bbbw; 


$O0D3 - 


Fault 


8 


1 




lookupBindlng) 


Fault 


8 


1 


{BBBW 


$00C6 - 


Fault 


8 







faultXlateO) 


Fault 


8 





{ BBBW 


$008B - 


Fault 


8 


2 




faultCFutO) 


Fault 


8 


2 


(BBBW 


$008D - 


Fault 


8 







lookupBindlng) 


Fault 


8 





{BBBW 


$00C6 - 


Fault 


8 


2 




faultXlateO) 


Fault 


8 


2 


{BBBW 


$008B = 


Fault 


8 


1 




enterBinding) 


Fault 


8 


1 


{BBBW 


$00C5 = 


Fault 


8 


3 




blockSend) 


Fault 


8 


3 


{BBBW 


$00C2 = 


Fault 


8 


2 




lookupBinding) 


Fault 


8 


2 


(BBBW 


$00C6 = 


Fault 


8 







enterBinding) 


Fault 


8 





(BBBW 


$00C5 - 


Fault 


8 


3 




faultLimitO) 


Fault 


8 


3 


{BBBW 


$0088 - 


Fault 


8 


3 




blockSend) 


Fault 


8 


3 


{BBBW 


S00C2 = 


Fault 


8 


1 




allocObject) 


Fault 


8 


1 


{BBBW 


S00C4 - 


Fault 


8 


3 




faultLimitO) 


Fault 


8 


3 


{BBBW 


$0088 - 


Fault 


8 







allocObject) 


Fault 


8 





(BBBW 


$00C4 = 


Fault 


8 


1 




lookupBinding) 


Fault 


8 


1 


(BBBW 


$00C6 - 


Fault 


8 







lookupBinding) 


Fault 


8 





(BBBW 


S00C6 - 


Fault 


8 


1 




blockMove) 


Fault 


8 


1 


{BBBW 


S00C1 = 


Fault 


8 







blockMove) 


Fault 


8 





(BBBW 


S00C1 - 


Fault 


8 


1 




faultLimitO) 


Fault 


8 


1 


(BBBW 


$0088 - 


Fault 


8 







(faultLimitO) 


Fault 


8 





{BBBW 


$0088 - 


Fault 


8 


1 




(faultXlateO) 


Fault 


8 


1 


{BBBW 


$008B - 


Fault 


. 8 







objectNode) 


Fault 


. 8 





{BBBW 


$00D3 = 


Fault 


: 8 


1 




iobjectNode) 


Fault 


: 8 


1 


{BBBW 


$00D3 - 


Fault 


: 8 







objectNode) 


Fault 


: 8 





. { BBBW 


S00D3 - 


Fault 


: 8 


1 




IobjectNode) 


Fault 


: 8 


1 


: ( BBBW 


$00D3 - 


Fault 


: 8 


3 




objectNode) 


Fault 


: 8 


3 


• ( BBBW 


$00D3 = 


Fault 


: 8 


2 




(faultXlateO) 


Fault 


: 8 


2 


: { BBBW 


S008B = 


Fault 


: 8 







(faultCFutO) 


Fault 


: 8 





: { BBBW 


$008D = 


Fault 


: 8 


1 




(faultCFutO) 


Fault 


: 8 


1 


: { BBBW 


S008D - 


Fault 


: 8 


3 




IobjectNode) 


Fault 


: 8 


3 


: ( BBBW 


$00D3 - 


Fault 


: 8 


2 




; lookupBinding) 


Fault 


: 8 


2 


: ( BBBW 


S00C6 - 


Fault 


: 8 


3 




(faultCFutO) 


Fault 


: 8 


3 


• ( BBBW 


$008D - 


Fault 


: 8 







(objectNode) 


Fault 


: 8 





: ( BBBW 


$00D3 - 


Fault 


: 8 


1 




(objectNode) 


Fault 


: 8 


1 


: { BBBW 


$00D3 - 


Fault 


: 8 







(objectNode) 


Fault 


: 8 





: { BBBW 


) $00D3 = 


Fault 


: 8 


1 




(objectNode) 



DC fltLimit 

DC fltAllocObject 

DC fltLookupBinding 

DC fltBlockMove 

DC fltLimit 

DC fltXLATE 

DC fltLookupBinding 

DC fltXLATE 

DC fltObjectNode 

DC fltXLATE 

DC fltObjectNode 

DC fltLookupBinding 

DC fltXLATE 

DC fltCFUT 

DC fltLookupBinding 

DC fltXLATE 

DC fltEnterBinding 

DC fltBlockSend 

DC fltLookupBinding 

DC fltEnterBinding 

DC fltLimit 

DC fltBlockSend 

DC fltAllocObject 

DC fltLimit 

DC fltAllocObject 

DC fltLookupBinding 

DC fltLookupBinding 

DC fltBlockMove 

DC fltBlockMove 

DC fltLimit 

DC fltLimit 

DC fltXLATE 

DC fltObjectNode 

DC fltObjectNode 

DC fltObjectNode 

DC fltObjectNode 

DC fltObjectNode 

DC fltXLATE 

DC fltCFUT 

DC fltCFUT 

DC fltObjectNode 

DC fltLookupBinding 

DC fltCFUT 

DC fltObjectNode 

DC fltObjectNode 

DC fltObjectNode 



LIMIT 

$04 

$06 

$01 

LIMIT 

XLATE 

$06 

XLATE 

$13 

XLATE 

$13 

$06 

XLATE 

CFUT 

$06 

XLATE 

$05 

$02 

$06 

$05 

LIMIT 

$02 

$04 

LIMIT 

$04 

$06 

$06 

$01 

$01 

LIMIT 

LIMIT 

XLATE 

$13 

$13 

$13 

$13 

$13 

XLATE 

CFUT 

CFUT 

$13 

$06 

CFUT 

$13 

$13 

$13 
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e 



(BBBW) S00D3 - DC 

(object Node) 
(BBBW) S0OD3 - DC 

(faultCFutO) 
(BBBW) $008D = DC 

(faultCFutO) 
(BBBW) $008D - DC 

(objectNode) 
(BBBW) $00D3 - DC 

(objectNode) 
(BBBW) $00D3 - DC 

(objectNode) 
(BBBW) $00D3 - DC 

(faultCFutO) 
(BBBW) S008D - DC 

(faultCFutO) 
(BBBW) $00 8D - DC 

(objectNode) 
(BBBW) $O0D3 - DC 

(objectNode) 
(BBBW) 500D3 = DC 

(faultXlateO) 
(BBBW) S008B = DC 

(faultCFutO) 
(BBBW) S008D - DC 

(objectNode) 
(BBBW) $00D3 - DC 

(objectNode) 
(BBBW) $00D3 - DC 

(faultCFutO) 
(BBBW) S008D » DC 

(faultXlateO) 
(BBBW) $008B - DC 

(faultXlateO) 
(BBBW) $008B - DC 

(faultXlateO) 
(BBBW) $008B - DC 

(faultXlateO) 
(BBBW) $008B - DC 

^-.- Received priority message: 
OBJ:$801D9804 u-1 f-0 of f set«$00766-Reply length 
INT:$0000FC00 - 64512 
INT:S000000O0 - 
INT:$00000037 - 55 



@0..3)«t«t» 

1544 ticks executed. 



Fault 
Fault 
Fault 
Fault 
Fault 
Fault 
Fault 
Fault 
Fault 
Fault 
Fault 
Fault 
Fault 
Fault 
Fault 
Fault 
Fault 
Fault 
Fault 
Fault 
Fault 
Fault 
Fault 
Fault 
Fault 
Fault 
Fault 
Fault 
Fault 
Fault 
Fault 
Fault 
Fault 
Fault 
Fault 
Fault 
Fault 
Fault 
Fault 
Tick 



e 



e 

1543 



fltObjectNode 
fltObjectNode 
fltCFUT 
fltCFUT 
fltObjectNode 
fltObjectNode 
fltObjectNode 
fltCFUT 
fltCFUT 
fltObjectNode 
fltObjectNode 
fltXLATE 
fltCFUT 
fltObjectNode 
fltObjectNode 
fltCFUT 
fltXLATE 
fltXLATE 
fltXLATE 
fltXLATE 
■30004 



;$13 

;$13 

;CFUT 

; CFUT 

;S13 

;$13 

,-$13 

;CFUT 

;CFUT 

;$13 

;$13 

; XLATE 

; CFUT 

;$13 

;$13 

;CFUT 

; XLATE 

; XLATE 

; XLATE 

; XLATE 



ynamic Instructi 


on Usage 


STOP 


2887 


47.13% 


READ 


737 


12.03% 


WRITE 


500 


8.16% 


READR 


163 


2.66% 


SEND 


160 


2.61% 


DC 


143 


2.33% 


BR 


130 


2.12% 


XLATE 


123 


2.01% 


ROT 


117 


1.91% 


ADD 


104 


1.70% 


AND 


98 


1.60% 


WRITER 


88 


1.44% 


BT 


71 


1.16% 


SEND 2 


70 


1.14% 


BNIL 


69 


1.13% 


NOP 


64 


1.04% 


BF 


57 


0.93% 


LDIP 


54 


0.88% 


SUB 


52 


0.85% 


SUSPEND 


50 


0.82% 


XOR 


48 


0.78% 


CALL 


48 


0.78% 


WTAG 


44 


0.72% 


LDIPR 


40 


0.65% 


EQ 


37 


0.60% 


SENDE 


26 


0.42% 


EQUAL 


25 


0.41% 


SEND2E 


24 


0.39% 


CHECK 


22 


0.36% 


RTAG 


21 


0.34% 


OR 


15 


0.24% 


GT 


14 


0.23% 


ASH 


10 


0.16% 


ENTER 


7 


0.11% 


GE 


4 


0.07% 


BNNIL 


4 


0.07% 


NEG 





0.00% 


NOT 





0.00% 


FFB 





0.00% 


INVAL 





0.00% 


PROBE 





0.00% 


LSH 





0.00% 


NEQ 





0.00% 


MUL 





0.00% 
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Sample Program 



MULH: 





0. 


,00% 








NEQUAL: 





0. 


.00% 








CARRY : 





0. 


,00% 








HALT: 





0. 


.00% 








BZ: 





0. 


.00% 








LE: 





0. 


.00% 








BNZ: 





0. 


.00% 








LT: 





0. 


.00% 








STOP: 


2887 


47. 


.13% 








Move: 


1488 


24. 


,29% 








ALU: 


407 


6. 


.64% 








Branch: 


331 


5. 


.40% 








Network: 


330 


5. 


.39% 








Field: 


204 


3. 


.33% 








DC: 


143 


2. 


.33% 








Fault: 


142 


2. 


.32% 








Assoc: 


130 


2. 


,12% 








NOP: 


64 


1, 


.04% 








Other: 





0, 


.00% 








Foregnd: 


3239 


52, 


,87% 








Total: 


6126 












Fault Usage: 














object Node: 




21 


26. 


.25% 




faultxiat 


.e0: 




14 


17, 


.50% 


lookupBlndlng: 




11 


13. 


.75% 




faultCFutO: 




10 


12. 


.50% 




faultLimitO: 




8 


10. 


.00% 




blockSend; 




4 


5. 


.00% 




allocObject 




4 


5. 


.00% 


enterBlndl 


.ng: 




4 


5. 


.00% 




blockMove 




4 


5. 


.00% 



Total: 80 
The xlate hit ratio is 109 out of 123 ( 88.62%). 

376 words sent in 51 messages on priority 0. 

Average message size: 7.37. 

16.29 instructions/word (8.61 foreground instructions/word) 

120.12 instructions/message (63.51 foreground instructions/message) 
No priority 1 words sent. 

00. .3} 

Figure 5-14. MDPSim Transcript 

This transcript shows a MDPSim session in which the user loads the rangesum assembly code and calls the sum 
function with the argument 10 on a 2x2x1 -node J-Machine with COSMOS using only internal memory (-msize 
0x1000). Since watching faults was enabled, MDPSim prints each fault encountered at each MDP as it is run- 
ning. The fault message gives the number of the MDP on which the fault occurred, the number of the fault vector, 
and the name of the fault; the {BBBW) is additional MDPSim breakpoint and watchpoint information. Finally, after 
1544 steps the answer 55 is produced and displayed. 

The dynamic instruction statistics for the run are also shown. About half of the time is spent distributing the func- 
tions to all of the nodes; the second time sum is called with the argument 10, it only takes 893 ticks to produce the 
answer (a tick is the time it takes every node to execute one instruction; MDPSim assumes that every instruction 
runs in the same amount of time). 



97 



Chapter 6. Debugging 



Optimist II, Cosmos, and the Concurrent Smalltalk applications are large programs, and de- 
bugging them is an important consideration. I will not discuss the process of debugging Op- 
timist II itself; standard Common Lisp and CLOS techniques such as building firewalls and 
providing print routines for important data structures were used. 

The primary approach to debugging MDP code I took is prevention. I made sure that the 
Cosmos design was sound before running it. The criticality criteria were very helpful in 
avoiding re-entrancy and double fault problems. Nevertheless, while the prevention ap- 
proach was successful on Cosmos itself, it cannot be the sole debugging method used on the 
Concurrent Smalltalk programs. Instead, a combination of debugging means at various 
levels has been provided. 

Debugging Concurrent Smalltalk Code 

The first line of defense is the Optimist II compiler itself. The compiler will complain when it 
detects errors such as incorrect function argument counts or bad types, if types are declared. 

The second line of defense is the interpreter in the Optimist II compiler. The interpreter can 
be used to run Concurrent Smalltalk programs before they are downloaded into MDPSim or 
onto a J-Machine. The interpreter provides nearly complete checking of Concurrent 
Smalltalk programs, so it should catch most of the remaining bugs. However, the interpreter 
will not catch bugs which occur only on large data sets, nor will it find Cosmos's or the Opti- 
mist II code generator's bugs. 

Debugging MDP Code on MDPSim 

Debugging becomes considerably more difficult once the code is in assembly language form. 
Fortunately, Cosmos does include some facilities for debugging Concurrent Smalltalk pro- 
grams. 

The third line of defense is comprised of the safety features built into the MDP architecture. 
Type and bounds checking were extremely valuable when debugging Cosmos, as they catch 
most common type errors when they happen and prevent runaway programs from doing too 
much damage to the machine state. Without these facilities debugging Cosmos and Concur- 
rent Smalltalk programs could have been intractable. 

The fourth line of defense consists of safety checks built into a number of critical places in 
Cosmos. These checks include: 

• A check in the CFUT handler that distinguishes real cfutures from uninitialized vari- 
ables, together with the initialization of memory and globals to values that will cause CFUT 
faults. 

• Checks in the XLATE and INVADR handlers for references to primitive, nonexistent, or 
deleted objects. Without these checks, such references would generate messages that wander 
about the J-Machine forever. 

• A check in the Return handler to make sure that the context was expecting the value 
that was returned. This check catches the extremely elusive bug of replying to the same con- 
tinuation twice, as the second reply message may overwrite a variable in the context after it 
has been reallocated to a completely unrelated function. The bug will be caught eventually, 
even if the second function stores a cfuture into the same context location, because then there 
will still be two replies to the same context location, and the cycle will repeat itself. Of 
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course, by the time the bug will be caught, the original evidence may be gone, but at least 
there will be some indication of a problem. 

• A check in the Co routine for a reference to a nonexistent constituent of a distributed ob- 
ject. 

• A HALT on any reference out of bounds of any object except in BlockMove and Block- 
Send. 

• HALT instructions on any type or overflow faults that occur in the course of execution of 
Concurrent Smalltalk programs. 

Furthermore, MDPSim does its part to make debugging easier. Once the operating system is 
loaded, memory used by the operating system code is read and write-protected (it may only 
be executed) to catch any runaway references to it. Since dereferencing nil is a common 
mistake in the MDP's unchecked mode, physical memory locations through 3 have been 
protected from all accesses to catch any routines that dereference nonexistent objects. More- 
over, MDPSim immediately halts if a message is sent to a nonexistent node. 

MDPSim includes the HALT instruction which is not present on the MDP. The HALT instruc- 
tion immediately halts the simulated J-Machine without altering any state. However, the 
halt instruction can almost be emulated on the J-Machine — executing HALT will cause ei- 
ther an invinst or a catastrophe fault, which can be intercepted. 

Moreover, the newest MDPSim [25] includes hazard detection — MDPSim 7.0 will complain 
and optionally stop the program if it detects an unsafe programming construct such as refer- 
encing the FIR register if it could have been altered by an asynchronous interrupt or sending 
a message when the F bit is set (a network send fault could be catastrophic in this case). 
Clearly MDPSim cannot discover all such possible bugs, but it can provide considerable as- 
sistance in uncovering sporadic asynchronous bugs. 

Finally, MDPSim is deterministic — running the same program twice will always yield identi- 
cal results. Thus, if an inexplicable bug occurs, it can always be reproduced. Moreover, ear- 
lier snapshots in time can be examined by running the same session again in MDPSim. On 
the Macintosh version of MDPSim, the entire session is automatically saved, making repro- 
ducing it easy. 

Debugging MDP Code on a J-Machine 

Debugging code on a real J-Machine is still harder than debugging it with MDPSim. Cosmos 
currently does not include any facilities specifically designed for such debugging other than 
the ones described above, but such facilities are being added in the true J-Machine version of 
it. The primary facilities consist of a set of mousetraps to catch weird conditions such as 
hardware errors and a set of fault handlers that interact with the host through the diagnostic 
port. Unfortunately, it is impossible to examine an MDP's state without destroying some 
register values, so debugging on the hardware is much harder. 

Assuming one can stop the computation at a safe point, it is possible to get a dump of all 
memory and most registers on each MDP in a J-Machine. What does one do with a huge 
dump of the state of a J-Machine? One possible course of action would be to examine it using 
MDPSim's debugging facilities. Another possibility is periodically checkpointing the compu- 
tation on the J-Machine by saving images. If a crash occurs, earlier images can be examined 
or restarted to determine the cause of the crash. 

Summary 

Debugging Concurrent Smalltalk code, while not especially easy, is not impossible. Several 
lines of defense against bugs in Concurrent Smalltalk programs are provided. It is highly 
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recommended to try to find bugs in the earlier steps of the compilation process because the 
tools at those levels are more robust and informative (but not as faithful to the J-Machine). 

Although Cosmos includes many checks for the common Concurrent Smalltalk programming 
errors, Cosmos does not protect itself from itself— it does not detect corruption in its data 
structures. Fortunately, segmentation by the MDP ensures that those data structures could 
only be corrupted by Cosmos itself, as well-compiled Concurrent Smalltalk programs cannot 
reference data outside their segments. Cosmos was mainly debugged by design, with only 
minor debugging necessary once the operating system was written. 

MDPSim also helps in debugging MDP code by providing watchpoints, breakpoints, the halt 
instruction, hazard detection, and determinism, which allows any bug to be reproduced. 
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Chapter 7. Performance Measurements 

Both Cosmos and the code output by Optimist II were optimized for speed. This chapter pre- 
sents some measurements that determine just how fast compiled Concurrent Smalltalk runs 
on a J-Machine. Both theoretical derivations and real measurements are presented and 
compared. Both calculations indicate that the average grain size (the ratio of useful instruc- 
tions executed to messages sent) for running Concurrent Smalltalk on a J-Machine is be- 
tween 50 and 70 instructions, and the average number of instructions executed per method is 
about 100 instructions. This is a pity if the average method only performs a few instructions' 
worth of real computation, yet, since Cosmos and the code output by Optimist II are already 
heavily optimized, it does not seem likely that incremental changes will reduce these num- 
bers much further. 

In addition to the above figures, various other statistics are presented. The static and dy- 
namic instruction use frequencies were collected to identify areas in which the MDP's hard- 
ware performance could be improved; no major surprises were found there. These frequen- 
cies indicate that the MDP spends an average of about 2 cycles per instruction; this number 
increases to 4 if slow external DRAM is used to hold the user program and data. 

Finally, the network load is analyzed. The network should not become saturated until more 
than 343 MDPs are put together; if a larger J-Machine is to be built, either the network will 
have to be made faster, the operating system slower, or considerable attention will have to be 
paid to locality. 
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7.1. Derived Times 

This section presents some rough estimates of the overhead on the J-Machine. A number of 
assumptions are made when making these estimates; the results of actual measurements 
will be reported in the next section to verify those assumptions. 

Cosmos Estimates 

The instruction counts needed for various important Cosmos services are shown in Table 7-1. 
The counts are approximate, but usually accurate to within a few instructions. The counts 
listed may not be completely correct due to approximations in some routines. 

Table 7-1. Selected Cosmos Routine Instruction Counts 



Routine 



I Instruction Count 



I Description 



Method and Control Managers 



Apply 



3+ApplySelector or 
5+ApplyFunction 



Dispatch a general Apply message. 



ApplyFunction 



Dispatch an ApplyFunction message. 



ApplySelector 



>23 (>15+LookupMethodU) 



Dispatch an ApplySelector message. 



LookupMethod 



8+LookupMethodU 



Lookup a method given a class and a 
selector. 



LookupMethodU 



8 on cache hit, 

40+SaveStatelD023+message la- 
tency on cache miss. 



Internal core of LookupMethod. 



CFUT Fault 



=30+2msize if context available 
on queue (14+SaveStatelD023) 



Save state when a cfuture was read 
from the context. 



Reply 



27 if process is restarted; 
12 if not. 



Process a reply message. 



RestartContext 



20 



Unconditionally restart a context. 



Context Manager 



SaveStatelD023 



14 if message already saved in 
context and new context avail- 
able in queue 

16+2msize if context available on 
queue; 

17+2msize+AllocNextObject 
otherwise. 



Save the ID registers and the mes- 
sage in the context, save the context, 
and suspend. 



Global Object Manager 



NewObject 



'37+2msize+Reply 
(21+SaveStatelD023+Reply) 



Allocate a remote object. 



ClassOf 



15 to 25 (10+TypeOf) 



Return the class of an object. 



TypeOf 



5 to 15, depending on tag 

(5 for integers, 11 for ordinary 

user objects, varies for others) 



Internal core of ClassOf. 



ObjectNode 



9 for primitive objects; 
4 for ordinary user objects; 
32 for distributed objects. 



Return the node most likely to con- 
tain the object or a random node if 
the object is primitive. 



Co 



38 (49 when the object has more 
constituents than there are 
nodes) 



Return the ID of the nth constituent 
of a distributed object. 
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PreferredConst 


27 (12 when the object has more 
constituents than there are 
nodes) 


Return the ID of a nearby con- 
stituent of a distributed object. 


MigrateObject 


=62+AllocObject+LookupBinding+ 
2size (may vary if more or fewer 
contexts are restarted) 


Receive and install an object and 
restart a context waiting for it. 


UpdateHome 


>ll+LookupBinding 


Update a migrated object's home 
BRAT entry. 


Unlock 


9 


Unlock an object. 


Local Object Manager 


NewLocalObject 


3+ AllocNextObject 


Allocate a local object of the given 
class. 


AllocNextObject 


12+AllocObject+EnterBinding 


Allocate a local object using the next 
ID and the given header word. 


DeallocateObject 


11+PurqeBindinq 


Deallocate a local, unlocked object. 


BRAT Manager 


EnterBinding 


26 (35 if no free BRAT entries 
were available; may also compact 
heap) 


Allocate a new BRAT binding. 


LookupBinding 


14+5n, where n is the number of 
links traversed in linked list. 


Lookup a binding in the BRAT. 


PurgeBinding 


2+ DeleteBinding 


Delete a binding from the BRAT and 
the XLATE table. 


DeleteBinding 


23+5n, where n is the number of 
links traversed in linked list. 


Delete a binding from the BRAT. 


Heap Manager 


AllocObject 


20; may also compact heap 


Allocate an object on the heap. 


CompactHeap 


varies from 22V to 102V or more, 
where N is the size of the heap. 


Compact the heap. 


Utilities 


Divide 


from 40 for small numbers to 400 
for lame numbers. 


Divide two 32-bit numbers and re- 
turn the quotient and remainder. 


Faults 


Early Fault 


8 


Penalty for reading data from mes- 
sage queue too fast. 


Send Fault 


8 


Penalty for sending data into net- 
work too fast. 



Some Definitions: 

size is the size of the object. 

msize is the size of the message in the queue. If the message has already been saved and the Q flag is false, 
msize is defined to be -1 for the purposes of the above timings. If msize is mentioned in a time expression, the 
current process is suspended and later restarted; the time does not include the time between the suspension and 
the resumption because other processes are assumed to execute then. 

User Program Estimates 

In contrast to the counts in Table 7-1, an examination of the rangesum method in Figure 5-12 
shows that it takes about 13 instructions 1 to execute a function or method call and about 8 
instructions to return a reply and suspend (see Table 7-2). Thus, the typical time the MDPs 
spend in user code to execute a function call and return is about 21 instructions; perhaps a 
few more instructions are used for primitives, but the user code execution time is seldom 
more than 30 instructions per function invocation. Hence, estimating conservatively, any 



1 There are several NOPs not shown in the listing caused by alignment around DCs. 
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Table 7-2. Selected User Action Instruction Counts 



Action 


Instruction Count 


Description 


Function or 
Method Call 


=ll+nargs. May be higher if ar- 
guments must be touched or 
lower if many SEND2s are used. 


Call a function or a method. The 
time does not include the CFUT fault 
or replv time. 


Reply with 
Suspend 


8-10 


Return a reply to the caller. 


Primitive 


1-4 for instructions and up to 
400 or more for system calls. 


Perform a primitive operation such 
as an addition or a conditional. 



Nargs is the number of arguments sent in the application message. 



time above 30 instructions per function or method invocation is spent in the operating sys- 
tem 1 . 

Analysis 

A juxtaposition of the main figures from Tables 7-1 and 7-2 reveals that a typical program 
will spend about 70% of its active time in the operating system and 30% of the time in user 
code. Furthermore, the program will take about 100 instructions per function invoked, ex- 
cept for tail-forwarded functions which will only take about 25 instructions each. About 20 
extra instructions should be added for each method dispatch that the compiler is unable to 
optimize out. To derive these estimates the following system of accounting is used: the work 
ascribed to a function invocation consists of all work needed to call the function on the origi- 
nating node plus all work needed to dispatch the function on the called node, but not includ- 
ing the work done by the called function to call other functions. 

Standard Invocations 

Each non-tail-forwarded function invocation requires the processing of an objectNode call, 
a function message send, a reply message, and optionally a cfuture fault on the originating 
node, and a function dispatch and a reply on the called node. Assuming that the average 
function call has two arguments, the total operating system work for the above activity is: 

ObjectNode + ApplyFunction 

+ c(CFUT fault + restarting Reply) + (l-c)(non-restarting Reply) 
= 9 + 4 + 69c + 12(l-c) 
= 25 + 57c instructions. 

c is the probability that a cfuture will be referenced before being replaced by the returned 
value. This probability can vary over a wide range depending on the branching factor of the 
program call graph, c is 1.0 for a recursive factorial program and 0.5 for a recursive fibonacci 
or rangesum program. If a branching factor between 1 and 2 is assumed, c will be somewhere 
between 0.5 and 1.0; suppose it is 0.75, which results in 68 instructions executed in the oper- 
ating system per function invocation. 

The total user code work is 

Function call + Primitives called by function + Reply with Suspend. 
The time spent executing primitives will vary greatly depending on the application; 10 in- 
struction seems reasonable for most cases, although it will be higher if the user program calls 
Divide or allocates objects. Substituting this number and the average number of arguments 
yields a total user code work of 



1 Tail-forwarded calls are cheaper because the net cost of a tail-forwarded call is one call and no return, which is 
about 15 user code instructions. 
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13 + 10 + 9 = 32 instructions. 

Thus, the total amount of work taken to process one function invocation is 100 instructions, 
out of which about 10 instructions (the primitives) could be construed as being "useful" work 
and the rest overhead. This figure does not include any object migration or XLATE miss 
overhead. These results should not be interpreted as implying that an MDP running Cosmos 
has a performance 10 times slower than a comparable processor in a sequential computer be- 
cause sequential computers also have a considerable function calling and parameter passing 
overhead. 

Tail-Forwarded Invocations 

Tail-forwarded applications are considerably more efficient. Using the accounting method 
outlined above results in ascribing 



c 
c 
c 
c 



ObjectNode 
9 instructions 



Begin 
Message Send 
11 instructions 



Finish 

Message Send 

2 instructions 



Fault on 

Cfuture 

42 instructions 




Message 
Travel Time 
10 instructions 




ApplyFunction 

4 instructions 






C 



JL 



Receive Reply 

and Resume 

27 instructions 





Called 
Function 

n instructions J 

zzir^ 



Message 
Travel Time 
10 instructions 



[ Suspend j 

V 1 instruction J 



Send 

Reply Message 

8 instructions 

I 



Figure 7-1. Function Invocation Latency 

The latency of the network is estimated at about 10 instruction times (20 cycles) to send a message between two 
randomly chosen nodes on a 4096-node machine. 

If n is the time taken by the called function, the latency of invoking a function is 9+1 1+10+4+8+10+27+n = 79+n 
instructions unless the called function takes fewer than 12 instructions, in which case the latency is 9+1 1+2+42+27 
= 91 instructions. 
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Ob jectNode + ApplyFunction = 13 instructions 

operating system overhead and 

Function call + Primitives called by function = 23 instructions 

user code work. The total work done is 36 instructions, out of which again 10 instructions is 
"useful" work. 

Latency 

The preceding analysis calculated the total amount of work needed per function invocation in 
a program, which determines throughput on a fully loaded system in which each processor is 
busy; however, another important component of performance is latency. It turns out that the 
latency of a function invocation can be lower than the amount of work done by the function 
invocation because two processors (the caller and the callee) can execute much of the function 
invocation in parallel. 

Assuming no other activity in the system, a non-tail-forwarded function invocation will con- 
sist of the caller sending a message to the callee. Then the callee evaluates the function, 
while the caller takes a cfuture fault (or calls another function, but this won't matter). Un- 
less the called function is very short, the caller will finish the cfuture fault processing and 
then idle before it gets the reply message from the callee. Finally, the callee replies to the 
caller, which restarts the calling process. 

As can be seen in Figure 7-1, the latency of a function call is 79 instructions in addition to the 
time taken to execute the function; if the function takes fewer than 12 instructions to exe- 
cute, the overall latency is 91 instructions. These numbers are less than the total amount of 
work done by the system (104 instructions). 

Summary 

The results above indicate that the number of instructions needed to process a function invo- 
cation for Cosmos running on a J-Machine should be about 100 instructions, with the notable 
exception of tail-forwarded functions, which require only about 36 instructions. The instruc- 
tion counts may be higher if many primitive calls are made or if the operating system faults 
often. 
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7.2. Measurements 

Grain Size and Machine Load 

To attempt to measure the J-Machine's performance and grain size, I ran several programs, 
including factorial (Figure 7-2); rangesum as listed in Chapter 5; rangesum2 (Figure 7-3), 
which is a version of rangesum which builds and traverses a data structure; and sort (Figure 
7-4), which generates and sorts an array of n pseudo-random numbers using the Batcher 
parallel sort technique described on page 112 of [28]. 

(defun fact (n) 
(if (zero? n) 
1 
(* n (fact (- n 1))))) 

Figure 7-2. Factorial Program 

(defclass pair (object) 
car 
cdr) 

(defun cons (x y) :pair 

(put-car-cdr (new pair) x y) ) 

(def method put-car-cdr pair (x y) :pair 
(cset car x) 
(cset cdr y) 
self) 

(defun make-countlist (low: integer high: integer) 
(if (> low high) (halt)) 
(if (= low high) 
low 

(let ((middle (// (+ low high) 2))) 
(cons (make-countlist low middle) 

(make-countlist (+ middle 1) high))))) 

(defmethod reduce pair (op:funct) 

(op (reduce car op) (reduce cdr op) ) ) 

(defmethod reduce integer (op:funct) 
self) 

(defun add (x y) 
(+ x y)) 

(defun reduce-add (tree) 
(reduce tree add) ) 

(defmethod ramp integer () 
(make-countlist self) ) 

(defmethod rangesum2 integer () 
(reduce-add (ramp self) ) ) 

Figure 7-3. Rangesum2 Program 

This program exercises several Concurrent Smalltalk object facilities such as allocating objects and traversing 
trees. 
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(defclass distarray (distobj) 
value) 

(defmethod initialize distarray (low, high: integer f:funct) 
(if (= low high) 

(cset (get-value (co group low) ) (f low) ) 
(clet ((middle (// (+ low high) 2))) 
(concurrently 
(initialize group low middle f) 
(initialize group (+ middle 1) high f ) ) ) ) ) 

(defun make-distarray (n modulus) 
(clet ((da (new distarray n) ) ) 

(initialize da (- n 1) (lambda (x) (mod (* x x x) modulus))) 
da)) 

(defmethod sort-exchanges distarray (low, high, p, r, d: integer) 
(if (<= low high) 
(if (= low high) 

(clet ((low2 (+ low d) )) 

(clet ( (vl (get-value (co group low) ) ) 

(v2 (get-value (co group low2) ) ) ) 
(if (> vl v2) 
(concurrently 
(cset (get-value (co group low) ) v2) 
(cset (get-value (co group low2) ) vl))))) 
(clet ((middle (// (+ low high) 2))) 
(concurrently 
(sort-exchanges group low middle p r d) 
(sort-exchanges group (+ middle 1) high p r d) ) ) ) ) ) 

(defmethod sort-q distarray (p, q, r, d: integer) 

(sort-exchanges group (- (logical-limit self) (+ d 1)) p r d) 
(if (<> p q) 

(sort-q group p (// q 2) p (- q p) ) ) ) 

(defmethod sort-p distarray (half , p: integer) 
(sort-q group p half p) 
(if (> P 1) 

(sort-p group half (// p 2)) 
group) ) 

(defmethod sort distarray () 

(clet ((half (ash 1 (- (integer-length (- (logical-limit self) 1)) 1)))) 
(sort-p group half half) ) ) 

(defun sort-distarray (n modulus) 
(sort (make-distarray n modulus))) 

Figure 7-4. Sort Program 

Sort-distarray, given the values of n and modulus, sorts an array of n pseudo-random numbers. The Ah pseudo- 
random number is equal to Pmod modulus. The Batcher sort algorithm is used, as presented on page 1 12 of [28]. 

Measurements were done on a 4-node and a 16-node simulated J-Machine. The results of the 
trials are summarized in Table 7-3. 

The grain size is the third number in the working instructions executed column. The time to 
process one function invocation is approximately twice the grain size unless tail-forwarding 
is used extensively. Except for sorting 4 numbers and the trivial factorial case, the results 
indicate function invocation times of between 81 and 162 instructions, which means that the 
estimate of 100 in the previous section was about right. Many of the functions in the sort 
sample program are tail-forwarded, so the average function invocation time for that example 
is less than twice the grain size. In addition, the sort program has a grain size higher than 
predicted in the previous section. This is probably due to frequent calls to the multiplication, 
division 1 , and co primitives as well as to distribution of large code objects; the grain size does 
decrease for larger input values. 



*A division by 2 is just a single ASH instruction, but the division in make-distarray requires a complete Divide call. 
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Table 7-3. Performance Measurements 



Program 


# 

MD 
PS 


Input 


Invo 

catio 

ns 


Start- 
up 


Total Instructions 
Executed 


Working Instructions 
Executed 


% Bu Net 
sy Wds 
Sent 


Net Avg 
Msgs Usg 
Sent Size* 


factorial 


4 
2x2 





1 


cold 


95 


8.64 


47.50 


17 


1.55 8.50 


18 


11 


2 


5.50 


10 


11 


cold 


5949 


30.82 


212.46 


2001 


10.37 71.46 


34 


193 


28 


6.89 


warm 


3407 


28.16 


154.86 


1078 


8.91 49.00 


32 


121 


22 


5.50 


rangesum 


4 
2x2 


10 


21 


cold 


6985 


18.43 


134.33 


3364 


8.88 64.69 


48 


379 


52 


7.29 


warm 


3122 


12.10 


72.60 


1737 


6.73 40.40 


56 


258 


43 


6.00 


50 


101 


cold 


17017 


12.71 


80.27 


11585 


8.65 54.65 


68 


1339 


212 


6.32 


warm 


11998 


9.85 


59.10 


9395 


7.71 46.28 


78 


1218 


203 


6.00 


hot 


10982 


9.02 


54.10 


8841 


7.26 43.55 


81 


1218 


203 


6.00 


rangesum2 


4 
2x2 


10 


21 


cold 


15365 


24.08 


174.60 


6194 


9.71 70.39 


40 


638 


88 


7.25 


warm 


5470 


14.47 


86.83 


3067 


8.11 48.68 


56 


378 


63 


6.00 


50 


101 


cold 


27971 


13.40 


84.76 


19559 


9.37 59.27 


70 


2088 


330 


6.33 


warm 


23232 


12.57 


75.18 


16401 


8.88 53.08 


71 


1848 


309 


5.98 


hot 


21418 


11.78 


70.69 


15767 


8.67 52.04 


74 


1818 


303 


6.00 


sort 


4 
2x2 


4 




cold 


57939 


30.27 


298.65 


23982 


12.53123.62 


41 


1914 


194 


9.87 


warm 


35168 


36.79 


256.70 


14655 


15.33106.97 


42 


956 


137 


6.98 


29 




cold 


351144 


14.85 


101.22 


269019 


11.38 77.55 


77 


23647 


3469 


6.82 


warm 


289336 


12.94 


86.55 


232974 


10.42 69.69 


81 


22361 


3343 


6.69 


16 
4x4 


4 




cold 


201681 


95.90 


979.03 


24026 


11.42116.63 


12 


2103 


206 


10.21 


29 




cold 


868586 


32.56 


238.56 


295483 


11.08 81.15 


34 


26679 


3641 


7.33 


100 




cold 


2612981 


18.61 


126.35 


1469377 


10.46 71.05 


56 


140436 


20680 


6.79 



*The average message length includes the address word sent at the beginning of each message. That word is 
kept as the message is routed through the network but removed before the message is inserted into the queue on 
the destination node. 

The working instruction counts are instruction counts with all stop instructions executed in background loops re- 
moved; they represent the useful work done in the system. 

The three numbers in the total instructions executed and working instructions executed columns give the absolute 
numbers of instructions executed, the numbers of instructions per word of network traffic, and the number of in- 
structions per network message, in that order. 

A cold startup indicates that the program was executed just after it was loaded; a considerable portion of the run- 
ning time is spent on distributing the functions to all nodes that need them. 

A warm startup indicates that the program was executed after the functions it needed were already installed on 
every node. 

A hot startup indicates the third trial of the program on the particular input. This time may be less than the warm 
startup time because the previous trials have preallocated enough standard contexts on the MDPs to let the pro- 
gram run without the need to allocate any more contexts. Warm and hot startup times are probably the most rep- 
resentative of the J-Machine's performance on larger problems. 

The geometry of the J-Machine does not have much of an effect on a program simulated under MDPSim. It is 
unimportant anyway for the small sizes simulated above. 

Using inputs much larger than 50 for the range sums or 100 for the sort generated too much concurrency and 
caused the message queues to overflow. See Chapter 8 for a possible solution to this problem. 

Another pattern in Table 7-3 is that the percentage of the J-Machine that is busy is higher 
for the larger problems, which was to be expected. Also, the warm and hot start programs 
tended to exhibit more concurrency than the cold start ones 1 ; apparently there is some 
wasted time during the initial code distribution phase. 



lf Thi8 was not the case in a slightly earlier version of Cosmos, possibly because it was less efficient and therefore had 
more work to do. 
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Comparison with Dataflow 

Ellen Spertus made a few performance numbers available for her implementation of dataflow 
on the J-Machine [34]. I compared her timings with those obtained by Optimist II/Cosmos on 
the same examples. The program used was the factorial function listed in Figure 7-5. 

The dataflow interpreter took 431 steps to compute the factorial of 4. The Concurrent 
Smalltalk version of the factorial program took 725 steps to execute from a cold start but only 
265 steps from a hot start. The dataflow interpreter allocates code statically and references 
absolute addresses, so every timing is effectively a hot start. The dataflow interpreter took 
628 steps to compute three factorials of 4 in parallel, while the Concurrent Smalltalk code 
took 399 steps to complete the task. Thus, for this simple example the Concurrent 
Smalltalk/Optimist II/Cosmos combination is faster than dataflow, but not by much. How- 
ever, Concurrent Smalltalk is more dynamic than the current dataflow system in [34]. 

(defun fact (n) 
(if (<= n 1) 
1 
(* n (fact (- n 1))))) 

Figure 7-5. Factorial Program used in Dataflow 
Network Load 

As seen in Table 7-3, the network loading is usually between one word every 8 instructions 
and one word every 20 instructions, with the earlier figure dominating as the J-Machine uti- 
lization approaches 100%. If an average MDP instruction length is taken to be 2.0 cycles, 
this implies that a program could inject words into the network as fast as one word every 16 
cycles on every MDP. 

Suppose that we run one of the above programs on a J-Machine organized as a kxkxk mesh. 
Let N=kxkxk be the number of nodes. To a first-order approximation, the capacity of the 
network is 32V half-word-hops/cycle 1 , or 1.5iV word-hops/cycle. Assuming random sources and 
destinations, a message will have to travel an average of k/3 nodes on each of the three 
dimensions, so the expected distance the message has to travel is 3k/3 = k nodes. Hence, the 
network's theoretical capacity is the delivery of 1.5N/k = 1.5/e 2 words per cycle. On the other 
hand, the program offers iV/16 words/cycle to the network, which means that unless locality 
is exploited or the program slowed down, there will be an upper bound on the size of the J- 
Machine which can run Cosmos. 

A mesh loaded at about 30% of its theoretical capacity should be able to route messages 
without excessive delays [32]. To calculate the maximum k, set 

0.3xl.5A 2 = fc 3 /16 

k = 7.2. 

Thus, the network should not become a critical resource until a J-Machine with over 7 3 = 343 
nodes is built. If the network routing speed is doubled, network loading should not be prob- 
lematic until the J-Machine exceeds 14 3 = 2744 nodes. On the other hand, should the Cos- 
mos operating system be sped up somehow, the critical size might fall below 343 nodes. Seri- 
ous attention to locality will have to be paid if a J-Machine larger than a few hundred nodes 
is built; conversely, if only a small J-Machine is built, it may not be adequate for testing al- 
gorithms for exploiting locality because almost any algorithm will work. 



1 The J-Machine network can transmit half a word between every pair of adjacent MDPa on every cycle. 
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Table 7-4. Static Instruction Frequencies 



Instruction 


Count 


Freq. 


Instruction 


Count 


Freq. 


DC 


440 


19.33% 


XOR 


13 


0.57% 


READ 


324 


14.24% 


EQUAL 


11 


0.48% 


WRITE 


210 


9.23% 


ENTER 


10 


0.44% 


NOP 


173 


7.60% 


SENDE 


10 


0.44% 


WRITER 


104 


4.57% 


SEND2E 


10 


0.44% 


READR 


88 


3.87% 


NEG 


9 


0.40% 


BR 


86 


3.78% 


BZ 


9 


0.40% 


SEND 


80 


3.51% 


BNZ 


9 


0.40% 


ROT 


64 


2.81% 


PROBE 


7 


0.31% 


HALT 


64 


2.81% 


EQ 


7 


0.31% 


ADD 


59 


2.59% 


LT 


5 


0.22% 


AND 


50 


2.20% 


GT 


5 


0.22% 


BT 


46 


2.02% 


NOT 


4 


0.18% 


CALL 


46 


2.02% 


GE 


4 


0.18% 


BF 


42 


1.85% 


BNNIL 


4 


0.18% 


SUB 


39 


1.71% 


FFB 


4 


0.18% 


CHECK 


32 


1.41% 


LSH 


3 


0.13% 


OR 


29 


1.27% 


RTAG 


3 


0.13% 


XLATE 


29 


1 .27% 


N EQUAL 


1 


0.04% 


BNIL 


25 


1.10% 


STOP 


1 


0.04% 


LDIPR 


23 


1.01% 


INVAL 


1 


0.04% 


SEND2 


22 


0.97% 


LE 


1 


0.04% 


LDIP 


21 


0.92% 


MUL 





0.00% 


SUSPEND 


20 


0.88% 


MULH 





0.00% 


ASH 


15 


0.66% 


CARRY 





0.00% 


WTAG 


14 


0.62% 


NEQ 





0.00% 



Instruction 


Count 


Freq. 


Instruction 


Count 


Freq. 


Move 


726 


31 .90% 


Bit Field 


116 


5.10% 


DC 


440 


19.33% 


Fault 


90 


3.95% 


ALU 


256 


1 1 .25% 


Other 


64 


2.81% 


Branch 


221 


9.71% 


Assoc. Table 


47 


2.07% 


NOP 


173 


7.60% 


STOP 


1 


0.04% 


Network 


142 


6.24% 









Total 



2276 



The above table includes the static instruction frequencies in the Cosmos kernel and the MDP runtime system. 
The second table categorizes the instructions according to their kinds. Each DC is counted twice because it oc- 
cupies as much space as two normal instructions. 173 NOPs had to be inserted to align instructions to word 
boundaries around DCs and at branch entry points. 

Instruction Frequencies 

I collected data on the frequencies of various MDP instructions to provide another estimate of 
what the MDP is doing most of the time. Table 7-4 shows a histogram of the static instruc- 
tion use in Cosmos and the MDP runtime routines, while Table 7-5 shows dynamic instruc- 
tion use in the cold-start sort trial running on 16 MDPs on an input value of 100. Combined 
with the results from Table 7-6, which show the memory reference frequencies, these tables 
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contain enough information to deduce the approximate 1 number of cycles taken per MDP in- 
struction. 

As shown in Table 7-6, a 16-MDP J-Machine will achieve somewhere between 1.87 and 3.48 
cycles per working instruction when running the sort program on an input of 100. The inter- 
nal-memory-only cycles-per-working-instruction number varied between 1.8 and 2.0 for other 
trials, while the external-memory cycles-per-working-instruction number varied between 3.0 
and 3.9. 



Table 7-5. Dynamic Instruction Frequencies 



Instruction 


Count 


Freq. 


Instruction 


Count 


Freq. 


STOP 


1143604 


43.77% 


CHECK 


14401 


0.55% 


READ 


309073 


1 1 .83% 


EQ 


14293 


0.55% 


WRITE 


169577 


6.49% 


LT 


11816 


0.45% 


ROT 


78272 


3.00% 


SEND2E 


11220 


0.43% 


READR 


76641 


2.93% 


NEG 


10886 


0.42% 


AND 


71150 


2.72% 


RTAG 


9859 


0.38% 


XLATE 


67230 


2.57% 


SENDE 


9594 


0.37% 


DC 


63981 


2.45% 


XOR 


5637 


0.22% 


BF 


53595 


2.05% 


FFB 


5624 


0.22% 


SEND 


47474 


1 .82% 


NOT 


5444 


0.21% 


ASH 


43161 


1 .65% 


EQUAL 


4725 


0.18% 


OR 


41948 


1.61% 


LE 


4296 


0.16% 


BR 


40800 


1 .56% 


ENTER 


1117 


0.04% 


WRITER 


39085 


1 .50% 


BNZ 


886 


0.03% 


SEND2 


30783 


1.18% 


LSH 


786 


0.03% 


ADD 


26468 


1.01% 


GE 


502 


0.02% 


NOP 


23633 


0.90% 


BZ 


475 


0.02% 


SUB 


21840 


0.84% 


BNNIL 


420 


0.02% 


BNIL 


20786 


0.80% 


MUL 


200 


0.01% 


SUSPEND 


20679 


0.79% 


PROBE 


74 


0.00% 


WTAG 


20406 


0.78% 


NEQUAL 


28 


0.00% 


BT 


19497 


0.75% 


NEQ 





0.00% 


LDIP 


19032 


0.73% 


MULH 





0.00% 


CALL 


18557 


0.71% 


CARRY 





0.00% 


LDIPR 


17712 


0.68% 


HALT 





0.00% 


GT 


15714 


0.60% 


INVAL 





0.00% 



Instruction 


Count 


Freq. 


Instruction 


Count 


Freq. 


STOP 


1143604 


43.77% 


Assoc. 


Table 


68421 


2.62% 


Move 


594376 


22.75% 


DC 




63981 


2.45% 


ALU 


283732 


10.86% 


Fault 




55301 


2.12% 


Branch 


136459 


5.22% 


NOP 




23633 


0.90% 


Bit Field 


123724 


4.73% 


Other 







0.00% 


Network 


119750 


4.58% 











Foreground 
Total 



1469377 
2612981 



56.23% 



This particular problem (Sort creating and sorting an array of 100 numbers on 16 MDPs) only kept an average of 
56% of the MDPs busy at a time— about 44% of the instructions executed are STOP. Although the frequency of 
the STOP instruction varies widely, the relative frequencies of the other instructions are typical for an MDP pro- 
gram. 



1 Some of the instruction row buffer dynamics were simplified and all branches were assumed to take 3 cycles, even 
though sometimes they may take fewer cycles. 
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Table 7-6. Memory Access Frequencies 

Operating System memory usage: 

Reads: 394430 (0.15/instruction, 0.27/working instruction) 

Writes: 1 52756 (0.06/instruction, 0.1 0/working instruction) 

Fetches: 2295682 (0.88/instruction, 1 .56/working instruction) 

Heap memory usage: 

Reads: 1 52262 (0.06/instruction, 0.1 0/working instruction) 

Writes: 138807 (0.05/instruction, 0.09/working instruction) 

Fetches: 317299 (0.12/instruction, 0.22/working instruction) 

Total memory usage: 

Reads: 546692 (0.21 /instruction, 0.37/working instruction) 

Writes: 291563 (0.1 1 /instruction, 0.20/working instruction) 

Fetches: 2612981 (1.00/instruction, 1.78/working instruction) 

3.48 cycles/working instruction 

1 .87 cycles/working instruction without external RAM 

The numbers above indicate the number of memory references (reads, writes, and fetches) done to the operating 
system (everything except the heap) and heap areas of memory by Sort running on 16 MDPs with an input of 100. 
The numbers for the other sample programs are similar. The cycles per instruction figures were calculated by 
adding the instruction frequencies from Table 7-5 weighted by the instruction times together with the memory 
usage frequencies weighted by memory access times. 

The 4096-word internal memory contains all of the operating system data and code and a small portion of the 
heap (about 2100 words). The rest of the heap (65536 words) lies in slow external memory. When running on a 
real J-Machine, the sort program will achieve somewhere between 1 .87 and 3.48 cycles per working instruction 
depending on how much of the program and data resides in the internal memory portion of the heap. 

Considering that internal memory read, write, and fetch times average 1, 0, and 1/8 cycles 1 , 
respectively, while external memory read, write, and fetch times are 6, 5, and 3 cycles 2 , re- 
spectively, a loss of only a factor of two in performance by placing the user program and data 
in external memory is surprisingly low. The reason for such a low cycles-per-working-in- 
struction figure when the user program and data are in external memory is the high Cosmos 
overhead. The MDP spends most of its time executing Cosmos code, which decreases the cy- 
cles-per-working-instruction number from what it would otherwise have been. For the same 
reason, changes that would reduce Cosmos overhead at the expense of user program size are 
undesirable in most cases. 



1 The write time is because it is absorbed by the execution of the WRITE instruction— WRITE does not require any 

extra cycles when writing to memory as opposed to a register. Eight instructions can be fetched in one cycle for an 

effective fetch time of 1/8 cycle per instruction; the branch instruction cycle counts already include the overhead for 

fetching the next set of instructions. 

2 Two instructions are fetched at a time from external memory in 6 cycles, for an effective fetch time of 3 cycles per 

instruction. 
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7.3. Conclusion 

Context Switching Performance 

A large component of the current operating system overhead time is the time taken to save 
and restore contexts, especially in the CFUT fault handler. One possibility to increase the 
speed of the CFUT fault handler is to not save data registers and not copy the message upon 
a CFUT fault [11]. Not saving data registers would reduce the fault handler's time by 4 in- 
structions 1 , while not copying the message would reduce it by 6 more instructions. However, 
these gains would come at a price — the size of the object code would increase because the 
compiler could not effectively allocate variables to registers; it is not clear whether the 
savings in the operating system overhead would outweigh the increased time spent executing 
user code, especially if the user code lies in external DRAM, while the operating system lies 
in fast internal SRAM. 

Summary 

Both the derived and measured data indicate that the grain size for running Concurrent 
Smalltalk on the J-Machine is 50 to 70 instructions. Since most functions involve two mes- 
sages (one apply message and one return message), the average number of instructions needed 
to process a function call is between 100 and 140; actually, it is probably closer to 100 be- 
cause of tail forwarding. 

When running entirely from internal memory, the MDP executes one instruction about every 
two cycles; if user programs and data have to be accessed from external memory, that count 
increases to about four cycles per instruction. The network load was calculated assuming a 
fast program (two cycles per instruction) injecting messages into the network at the fastest 
observed rate (one word every eight instructions) and utilizing 100% of the J-Machine's pro- 
cessors. If the messages are sent randomly under the above conditions, the J-Machine net- 
work will saturate when a J-Machine with over 343 MDPs is built. Of course, most programs 
will not be as fast, but some crafted library routines could impose network loads as high as 
indicated above. To prevent network saturation, either the network will have to be made 
faster, the program slower, or some means of exploiting locality invented. 



l The reduction would be 8 instructions if the data registers did not have to be restored by the reply handler; how- 
ever, it is difficult for the reply handler to distinguish the cases in which it has to restore registers because some 
unanticipated fault like overflow happened from the cases in which it doesn't; the extra instructions needed to make 
this decision would make this optimization not worthwhile. 
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Although working Concurrent Smalltalk programs have been demonstrated, the Concurrent 
Smalltalk programming system is by no means complete. Some suggestions for improve- 
ments were discussed throughout the previous chapters — more optimizations could be added 
to the compiler, distributed objects could be distributed more uniformly, and storage used by 
free BRAT entries and free standard contexts could be placed back into the heap's free stor- 
age pool. 

Nevertheless, the possible modifications are by no means limited to the minor ones listed 
there. The Concurrent Smalltalk programming system is still an evolving research and 
demonstration vehicle, and many issues still have to be addressed before it becomes a truly 
general-purpose system. This chapter lists these issues together with potential approaches 
for addressing them. 

The first section lists features that were left out of the Concurrent Smalltalk implementation 
that are desirable in a full system. These features are useful in many specialized applica- 
tions, but the system can work without them. 

The second section lists the resource management concerns raised by the implementation of 
Cosmos. These concerns include load balancing, garbage collection, name space reuse, fanout 
bottlenecks, and parallelism control. A few ideas are suggested about handling the fanout 
bottleneck and parallelism control problems, but many of these issues are still in the re- 
search stage. 

The third section outlines a few changes that could be made to the MDP architecture that 
would improve the performance of Cosmos and compiled Concurrent Smalltalk programs. 
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8.1. Features 

This section lists additional features that would be desirable in the Concurrent Smalltalk 
environment. The most obvious ones are the current omissions from Cosmos: futures, ar- 
rays, floating point numbers, and overriding primitive methods. In addition, the perfor- 
mance of Concurrent Smalltalk loops could be improved. 

Arrays 

Arrays are already fully implemented in Optimist II — Optimist II can interpret and compile 
code containing arrays. Cosmos, however, does not currently support arrays. When imple- 
mented, they will be added in the form of MDP runtime code in the Runtime.m Cosmos file. 
Ideally four different kinds of arrays will be provided: strings, bit arrays, integer arrays, and 
general object arrays. Strings can pack four characters per word, bit arrays can pack thirty- 
two booleans per word, while integer arrays can, depending on the range of integers supplied, 
pack 1, 2, 4, 8, 16, or 32 integers per word. 

I expect arrays to be placed in self-contained objects fitting on single nodes rather than trees. 
This will limit arrays to about 200 words each because larger objects will overflow message 
queues when migrated. If large arrays are desired, distributed array classes should be de- 
fined, and perhaps new-simple-array, new-integer-array, new-string, and new- 
boolean-array could automatically allocate distributed arrays if their size arguments are 
large enough. 

Enough primitives have been provided in Concurrent Smalltalk to support almost all com- 
mon array operations efficiently. The map and init methods treat arrays dataflow-style, al- 
lowing elements of arrays to be defined in terms of other elements of the arrays. Cfutures 
could be used in unpacked arrays to prevent elements of arrays from being read before writ- 
ten; if an array is packed, a bitmap of valid elements, perhaps stored in a context, could be 
attached to it. 

Although implementing arrays well on the J-Machine is not particularly difficult, it is quite 
time-consuming and was omitted from this thesis for this reason. 

Overriding Primitive Selectors 

Concurrent Smalltalk allows user programs to override primitive selectors such as + and <, 
thereby allowing the implementation of additional number types such as complex numbers 
and matrices which respond to the traditional numeric operations. While Optimist II permits 
selectors to be overridden in its interpreter, Cosmos does not support this facility, again be- 
cause this feature would be too time-consuming to implement. 

Adding the ability to override primitive selectors will not be as easy as adding arrays, but, 
fortunately, all the hardware building blocks needed are present in Architecture 11B. When 
an instruction is executed on a word with a type not supported by the hardware for that in- 
struction, the MDP faults. When a system call such as Divide is done on words with unsup- 
ported types, the operating system halts. All of the type-related fault handlers and halts will 
have to be implemented; they will have to decode the operation which caused the fault and 
emulate it by performing a standard message send. This emulation will require a lot of at- 
tention to little details and will be error-prone. Also, the context will have to be enlarged. 

For an example of the complexity involved, suppose the user overrides the = method to sup- 
port complex numbers. One of the consequences might be that a BNZ instruction somewhere 
in the program faults id because it was called on a complex number instead of an integer. 
The fault handler will have to call the = method to compare the complex number against 
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zero. In order to make this call, it has to save the entire state of computation in the context 
plus two more words: a return IP back to the fault handler and a slot into which the result 
(true or false) should be written. When the fault handler regains control, it will examine 
the slot and either take the branch in the user program or let execution continue with the 
next instruction. Due to CFUT-handling in the MDP's architecture (specifically, because 
write does not fault on cfutures), primitive selectors can never return cfutures. 

Another issue is what to do with commutative operations such as +. One might add an inte- 
ger to a complex number or a complex number to an integer, and it would be nice not to have 
to override the integer method for + to implement complex numbers. To implement this 
cleanly, the fault handler for add would have to try adding the arguments in one order, and, 
if no method matched, reverse the arguments and try again. If no method matched a second 
time, it would halt. 

Finally, a few minor modifications may have to be done to Optimist II's back end to support 
overriding primitive selectors. 

Long Integers 

Once overriding primitive selectors is supported, it will not be particularly difficult to imple- 
ment a bignum package for the MDPs and watch how many microseconds it takes a J-Ma- 
chine to compute the factorial of 1000. 

Futures 

Optimist II currently provides most of the support needed for full futures, although some 
modifications would still be necessary. The major changes would be to the operating system. 
The changes would be similar to those needed to implement primitive selector overriding — 
fut fault vectors would have to be defined and emulate all possible cases. 

Floating Point Numbers 

There are four different ways to implement floating point facilities on the J-Machine. Rang- 
ing from the easiest to the hardest and most exotic, they are: 

1. Emulate operations on the FLOAT data type through software fault handlers. This ap- 
proach would provide IEEE-compatible 1 , single-precision floating point number capability. 
Unfortunately, this approach would be very slow because of the large instruction decoding 
and floating point packing and unpacking overheads. The advantages of this approach are 
simplicity, transparency, and IEEE compatibility, if desired. 

2. Store floating point numbers as two words each. One word would be the exponent and 
the other the mantissa. The precision would be intermediate between single precision and 
double precision. Floating point operations could be inlined, and micro-optimization tech- 
niques [12] could be applied. The advantage of this approach is speed without the need for 
extra hardware. The disadvantages are that this approach would need object inlining to be 
implemented by Concurrent Smalltalk (otherwise this technique would be even slower than 
technique 1), the floating point number format is nonstandard, and the use of floating point 
numbers would be cumbersome. Since floating point variables would take two words instead 
of one, they would have to be declared as such to avoid losing efficiency, and a variable could 
not efficiently support both floating-point and non-floating-point values. The last restriction 
is a major problem because floating-point versions of all of the methods operating on general 
objects might have to be written and used to achieve good performance. 

3. The third possibility would be the inclusion of a floating-point unit on the MDP. The unit 
would require no significant software-visible architectural changes; the arithmetic instruc- 



1 ANSI-IEEE standard 754. 
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tions would simply start working on words tagged float, and maybe a Div instruction and a 
few control registers would appear. The disadvantage of this approach would be the inclu- 
sion of a hardware floating point unit on the MDP, which would increase the hardware's 
complexity. The advantages would be speed, simplicity, transparency, and IEEE compatibil- 
ity, if desired. 

4. The last possibility would be addition of RAP [19] chips to the J-Machine network. RAP 
chips are custom chips that contain a large number of serial floating point units, achieving 
estimated peak performance of 300 MFLOPS per chip. Under this approach, an MDP would 
send floating point calculations to a friendly neighborhood RAP, which would do the calcula- 
tions and respond to the continuation it was given. This approach would work well if the 
floating point calculations were grouped and did not have to be mixed with symbolic process- 
ing. If the MDPs were to perform mainly symbolic processing with occasional floating point 
instructions, the message overheads would make this approach inefficient. This approach 
would require a large investment in operating system and runtime software, and it is not 
immediately clear that it would be faster than approach 3, although the potential payoff is 
large. 

True Loops 

Loops are currently not implemented particularly efficiently in Concurrent Smalltalk. It is 
not clear whether this inefficient implementation will hurt program performance; it does, of 
course, depend on how often loops are used. Using iterators and similar abstractions to step 
through arrays and other data structure is usually preferred to using loops because iterators 
might execute in parallel, while loops are inherently sequential. Nevertheless, there might 
be some situations where sequential loops are needed. 

The primary reason for the current, inefficient implementation of loops is the need to ensure 
that a loop does not execute for a long time uninterrupted, preventing other messages from 
being executed at the node and maybe even causing a message queue overflow. Currently 
Optimist II compiles a loop into a function which calls itself tail-recursively, which is a fairly 
large penalty to pay for tight loops. The implementation could be improved to a true loop in- 
side a lambda if the code inside the loop either made at least one full-fledged function call per 
iteration or tail-recursed every few iterations; either case takes care of the message problem. 
Some experimentation is needed in this area to determine the best course of action. 

Inline Objects 

The largest feature change to the Optimist II compiler would be the addition of inline objects. 
This would be a difficult and error-prone process because all cases have to be handled well; 
these cases include passing an inline object to a function that does not expect one, storing in- 
line objects in contexts, creating pointers to inline objects, and altering inline objects. It is 
likely that if inline objects were implemented, several versions of each function would be 
compiled. One version would be unoptimized, while the others would support inline objects 
as arguments and results. The constant folder would then try to convert unoptimized func- 
tion calls to optimized function calls in the same way it currently converts method calls to 
function calls. 
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8.2. Resource Management 

Concurrent Smalltalk presents the programmer with an ideal model of a machine with an 
unlimited number of processors and an unlimited amount of memory; unfortunately, real 
computers are limited in both the number of processors and the size of memory. Several re- 
source management problems result from the discrepancy between the Concurrent Smalltalk 
ideal and the hardware reality. These problems include reusing memory that can no longer 
be accessed and simulating an unlimited number of processors with a fixed, finite number. 
Additionally, there are a few bottlenecks in the current system that can be ignored in small 
implementations but will become important in large-scale systems. 

Heap Compaction 

The current design of the Cosmos heap compactor compacts the entire MDP heap when a 
storage allocation request exceeds available free memory. This approach works, but it has 
two significant disadvantages, both related to the long time it takes to compact the memory: 

1. On a small J-Machine, the MDP will effectively stop responding until the heap com- 
paction is done. In the few tens of thousands of instruction it takes the MDP to do the heap 
compaction, the other MDPs may run out of things to do and all wait for the stopped MDP. 
The heap compaction will effectively stop the entire computer. Soon after the first MDP fin- 
ishes its heap compaction, another MDP may starts its own compaction, and the process will 
repeat. 

2. On a large J-Machine, a heap compaction on one MDP will not be enough to stop the 
other MDPs from running; instead, they will continue to run longer and are likely to send 
enough messages to the compacting MDP that its incoming message queue overflows. The 
poor MDP now does not know what to do because it has no free memory into which to put the 
extra messages. 

Finally, the current heap compactor does not compact BRAT entries or standard contexts, 
but it could compact them with a little additional effort. 

An incremental heap compactor would address both of the serious disadvantages of the cur- 
rent heap compactor. It might even be possible to run the incremental heap compactor in the 
MDP's background mode, although the lack of a separate set of fault vectors and a full set of 
registers would pose serious detriments. 

Fanout Bottlenecks 

Cosmos currently assigns one node as a "home" of an object; with few exceptions, if a differ- 
ent node needs a copy of that object, it turns to the home node to get it, and the home node 
takes care of supplying the object. Unfortunately, sometimes many nodes want to use the 
same object simultaneously. Accesses to mutable objects are serialized anyway, so having a 
home node for a mutable object is not such a bad idea; however, there is no reason why ac- 
cesses to immutable objects should be unnecessarily serialized. On the contrary, functions 
are immutable objects, and it would be nice if a function's home node did not have to send a 
copy of the code to every other node on a 65536-MDP computer. 

One solution to this bottleneck would be to assign several home nodes to each immutable ob- 
ject; perhaps the more popular the object would be, the more home nodes it would have. 
When another node needed a copy of that object, it could ask the closest home node. If one 
home node were made special and all the others allowed to purge their copies of the object 
because they could get it from the special home node, this scheme would become a distribu- 
tion tree. Brian Totty presents an analysis of distribution trees in [38]. 
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Cosmos also serializes the allocation of distributed objects at one node because of the need to 
give each distributed object a unique ID. The allocation process could be parallelized by 
splitting the ID space and making several nodes responsible for allocating distributed ob- 
jects, one for each chunk of addressing space. 

Garbage Collection 

Garbage collection on the J-Machine is currently an open research problem. Parallel garbage 
collection algorithms exist, but they may not work well on the J-Machine. For example, the 
parallel garbage collection algorithm in [29] requires a node to keep track of all of the local 
IDs it sends to other nodes, which would be unfeasible for two reasons. First, each MDP 
spends a considerable amount of its time sending data onto the network, and its performance 
would suffer if it had to record every ID sent. Second, most local IDs become known to other 
nodes in the J-Machine, degenerating the algorithm's performance. 

Perhaps the best solution is a simple mark-and-sweep algorithm run on all MDPs in parallel; 
after all, the combined MDPs have a considerable amount of processing power. Unfortu- 
nately, this approach has three potential problems: 

1. The mark-and-sweep garbage collector has to stop the J-Machine, and it might be diffi- 
cult to stop all processors and allow the messages in the network to land somewhere, espe- 
cially if the messages in the network are blocked because some node is out of memory and 
queue space. 

2. The J-Machine network bandwidth may be insufficient for a mark-and-sweep garbage 
collection. 

3. There may not be enough room on the MDPs for the intermediate storage needed by the 
algorithm. In particular, if all the MDPs immediately start marking their root sets, all mes- 
sage queues will quickly overflow with mark messages. This is a parallelism control prob- 
lem. 

Load Management 

The purpose of load management on the J-Machine is to distribute a parallel computation 
evenly throughout the processors while keeping network congestion low. Load management 
is a very broad current research area. Cosmos and Optimist II include limited attempts to 
balance the load — Optimist II distributes the objects it compiles evenly among the nodes of 
the J-Machine, and Cosmos allocates new objects on random nodes and evaluates applica- 
tions on primitive objects on random nodes to prevent the entire computation from taking 
place on one node. Nevertheless, these are only initial steps to addressing the load manage- 
ment issues. The following are at least some of the load management concerns that should 
be addressed on a large J-Machine: 

• The current system for allocating objects may have to be reevaluated. At least theoreti- 
cally, the current system should perform quite well if all objects are about the same size. If 
the nodes on which objects are allocated are always picked randomly, memory usage on all 
nodes will remain within a few standard deviations of the average memory usage, so even on 
a large J-Machine the probability that a single node's memory overflows can be made expo- 
nentially small. On the other hand, real programs may allocate objects with a large variation 
of sizes, and they may wish to allocate objects on specific nodes to take advantage of locality. 
Both of these conditions may overflow memory on some nodes while other nodes still have a 
considerable amount of free memory. 

• An analogous issue to the one above is handling message queue overflows. Due to the 
queues' small sizes and the large variance in the sizes of messages, it is difficult to make 
queue overflows statistically unlikely. Instead, mechanisms have to be introduced to handle 
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them. These mechanisms should not allocate extra local memory because a queue overflow is 
most likely to happen when little or no memory is available because it is being compacted. 

• MDPSim assumes that the MDPs are connected by a crossbar network, so all MDPs are 
equally far apart from each other. This is a good approximation on a small J-Machine — on a 
64-node J-Machine organized as 4x4x4 MDPs, no two processors are more than 9 links apart, 
while the expected distance between two random nodes is only 3 links. On the other hand, 
on a 65536-node J-Machine organized as 64x32x32, locality becomes an important issue; if 
objects continue to be allocated randomly, the network will become hopelessly congested. 

There are two general approaches to distributing the load evenly. One approach is to make 
objects very mobile and hope that they will redistribute themselves to exploit locality. When 
a portion of the J-Machine becomes congested, it could simply throw objects at the rest of the 
J-Machine. JOSS hints [38] were an example of a technique that could be used by this ap- 
proach. While this approach is simple, it does suffer from some disadvantages. In particular, 
if load management decisions are made often, they cannot be too time-consuming to prevent 
excessive overhead. Also, when an object migrates often, it is difficult for a node to send a 
message to it. In JOSS, if a node does not know where an object is, it sends the message to 
the object's home node instead, which forwards it to the object. If an object is not at the home 
node, then both the home node and the object's current node are congested with messages 
addressed to the object. JOSS attempted to correct this problem through the use of hints, but 
JOSS-style hints may be ineffective because all first-time users of an object must still first 
reference the home node to get to the object. 

The other approach is making objects on the J-Machine relatively static and redistributing 
them to balance the load only occasionally. This is the approach taken in Cosmos. Objects 
are free to move around the J-Machine for short periods of time, but an object's home node 
asks the object to return to it when it another node sends a message to the object via the 
home node. Hence, objects tend to remain where they were first created. As long as the ob- 
ject allocator allocates objects well, the load will remain roughly balanced. Any small dy- 
namic imbalances that arise can be handled by the garbage collector, which could have the 
power to truly change an object's home node by renaming all of the IDs in the entire J-Ma- 
chine pointing to the object. 

Controlling Parallelism 

In addition to load balancing, which distributes a fixed amount of work among the MDPs, it 
will also be necessary to throttle the amount of work being done by the J-Machine as a whole. 
A simple example illustrates this point. 

(Def method fib Integer () 
(if (<= self 2) 
1 
(+ (fib (- self 1)) (fib (- self 2))))) 

Figure 8-1. A Doubly-Recursive Fibonacci Program 

Consider the doubly-recursive Fibonacci program in Figure 8-1. When run on a sequential 
computer, the program traverses the computation tree of the Fibonacci function in a depth- 
first order (Figure 8-2), taking only 0(n) space but exponential time to compute Fibbi). On 
the other hand, when run on the J-Machine, each invocation of fib except the tail ones at- 
tempts to evaluate the two recursive calls in parallel. In effect, the computer traverses the 
computation tree in breadth-first order (Figure 8-3). This is good if there are many proces- 
sors, because then the function is computed in only O(n) time. Unfortunately, this manner of 
computation requires an exponential amount of both main memory and message queue 
space. Thus, a parallel computer can fail if a program exhibiting too much parallelism is run 
on it. 
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Figure 8-2. Progress of a Sequential Computation 

Although the computation consists of a large number N of function invocations, a sequential computer traverses 
the computation tree in depth-first order, so only Oflog N) functions are active at any particular time (bold gray), 
and the "wavefront" of computation consists of only a single invocation (bold black) Oflog A/Jspace is required to 
run the program and constant-size message queues suffice because the wavefront is at most one invocation. 
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Figure 8-3. Progress of a Parallel Computation 

A parallel computation tends to evaluate the computation tree in breadth-first order, which requires the storage of 
most of the function invocations in the computation tree at about the half-way point. Thus, the computation re- 
quires O(N) space, and, moreover, the "wavefront" can also become as large as O(N). Hence, the computation 
also requires O(N) message queue space. The computation will exceed the parallel computer's memory if N is 
large compared with the number of processors. 
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When the compiled code for Fibonacci is run on a simulated 4-node J-Machine, Fib(ll) is the 
largest value that can be computed. An attempt to compute Fib(12) results in queue over- 
flows; enlarging the message queues or spilling them into main memory would not help much 
because the storage needs grow exponentially. 

Fortunately, it appears that a solution to this problem does exist. Why not change from 
evaluating the computation tree in a breadth-first fashion to a depth-first fashion when all of 
the processors on the J-Machine are busy? A seven-instruction change to the compiled code 
for Fib (Figure 8-4) accomplishes just what is needed. The change forces sequential evalua- 
tion of Fib's two recursive calls if the local message queue is more than a quarter full. Thus, 
the computation grows exponentially until all MDPs are saturated. From then on until the 
answer is ready, all MDPs are busy computing the problem without increasing the space re- 
quirements. After the change was made, the Fib program could calculate answers for much 
larger inputs. 

The simple change in Figure 8-4 is not a panacea, though. The change allows enough paral- 
lelism for the message queues to be a quarter full on the average throughout the J-Machine. 
Unfortunately, in practical simulations the sizes of the queues vary widely — the queues on 
some processors might be empty, while other MDPs may have queues that are more than 
half full. It is easy to see why this might happen— an MDP with a nearly empty queue is not 
throttled down and will happily send messages to an MDP with a nearly full queue. Due to 
this variance, the queues overflowed anyway if the threshold for inhibiting parallelism was 
set to half of the queue size. To summarize, it seems that this approach for controlling paral- 
lelism will work, but it may have to be combined with load balancing to keep the variance in 
queue sizes low. 

Name Spaces 

The scarcity of IDs in the 32-bit name space is also an important consideration on the J-Ma- 
chine. After allowing for flags and nonuniform usage of the name space, 32 bits allow only 
about a billion objects to be named on the J-Machine. Furthermore, if the name space is not 
reused, a J-Machine could run out of names in less than a second — each node is limited to 
creating only about 32000 objects before exhausting its name space. 

To solve this problem, object IDs could be collected and reused by the garbage collector 1 . The 
garbage collector could compact the ID space, which would also permit an ID-renaming load 
balancer almost for free. However, even this approach might not be enough. If the J- 
Machine is implemented using technology of the 1990's, it may well have enough physical 
memory to overflow the 32-bit name space even with garbage collection. At that point the 
only reasonable solution will be to increase the word size, perhaps to 64 bits. 



1 An approach that almost works and does not require a garbage collector is to test each candidate ID in the ID- 
generation routine. If the ID names an existing object, the ID-generator simply chooses another ID. Unfortunately, 
this approach does not work for immutable objects because some copies of such objects could exist with the home 
node not knowing about them. Keeping the home node informed about copies of its immutable objects would cause 
bottlenecks of its own, not the least of which are the space needed to store such information and the network band- 
width used to maintain it. 
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Figure 8-4. Modified Fib Assembly Language Function 

When the incoming message queue is at least a quarter full, the modified Fib function throttles down the paral- 
lelism by waiting until the result of the first recursive call has been received before starting the second one. The 
modification is shown in bold. No parallelism penalty other than the execution of six extra instructions is paid when 
the J-Machine is not saturated. 
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8.3. Architectural Considerations 

Some architectural modifications could be made that would streamline execution of MDP 
code in critical sections in the operating system. 

Minor Instruction Set Changes 

One set of optimizations with a relatively large payoff would be allowing MOVE instructions 
from the ID registers directly to memory and XLATE instructions directly from memory into 
address and ID register pairs. Introducing these instructions would cut the number of in- 
structions needed to save and restore ID registers on context switches by half, and it would 
accelerate allocation and deallocation of fast contexts. 

A large part of the operating system is still spent saving and restoring state in fault han- 
dlers. Also, most faults point the FIP register to the instruction after the one that faulted, 
while most fault handlers (with the notable exception of CALL) would rather resume the in- 
struction that faulted, requiring the FIP to point to the instruction that faulted. Backing up 
the FIP by one instruction takes five or seven instructions depending on whether a free regis- 
ter is available. Pointing the FIP to the instruction that faulted (except for CALL faults) or 
making an extra shadow FIP register that points to the instruction that faulted would reduce 
the number of instructions needed in important fault handlers such as CFUT, EARLY, and 
SEND— the EARLY and SEND fault handlers would be reduced from eight instructions to 
one! 

Other critical resources which are near the limits of their capacities are the message queues 
and the XLATE table. The message queues can only be made to hold 1024 words, and the 
XLATE table cannot hold more than 512 bindings. If a MDP has 65536 words of memory, it 
might be beneficial to have a 2048-binding XLATE table or a message queue that could hold 
4096 words, especially if large objects are frequently transmitted over the network. 

Another critical resource in the XLATE table is the key space. The XLATE table is a popular 
associative cache in the operating system, and it is used for a variety of purposes. Unfortu- 
nately, there are only 16 tags on the MDP, and tag conflicts exist among the keys XLATEd by 
the users of the XLATE table. For example, class/selector pairs had to be tagged INSTl be- 
cause all of the "normal" tags were already taken. A future version of the operating system 
might run out of key tags for the XLATE table. Possible solutions to this problem include, 
but are not limited to, providing several XLATE tables or using more than one word as a key. 

Finally, one instruction is seldom used and could be removed. The INVAL instruction is used 
only once in Cosmos in the heap compactor, and since a heap compaction takes a long time 
anyway, emulating inval in software would neither be difficult nor harm performance. 

Fast Context Saves and Restores 

Perhaps a more ambitious project would be to attempt to improve the MDP's context-switch- 
ing time by supporting in hardware a shadow image of the registers in memory. In other 
words, the registers would act as a cache for a context in memory. When a context switch oc- 
curred, the modified registers would be written back into memory and a new register set 
loaded from the new context. Quick register saving and restoring for fault handling is even 
more important than fast context switching, and this approach might be generalized to sup- 
port fault handling as well by allocating a context to each fault handler that wanted one. 
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8.4. Conclusion 

A number of desirable features for future inclusion in Cosmos or Optimist II were described, 
including arrays, full futures, overriding primitive selectors, floating point numbers, and 
large integers. Implementing arrays and primitive selector override facilities should not pre- 
sent major difficulties, although it will be time-consuming. Several approaches for imple- 
menting floating point numbers were discussed, including two software approaches — a fast 
and dirty one and a clean but slow one — as well as two hardware approaches — including a 
floating point unit on every MDP and including RAP chips in the J-Machine network. 

In addition, a number of resource management issues were discussed, ranging from heap 
compaction, garbage collection, load management, and ID reuse to fanout bottlenecks and 
parallelism control. New methods may have to be developed to support efficient garbage col- 
lection on the J-Machine, but once garbage collection is done, ID reuse and load management 
may be obtained for free. Parallelism control is a serious issue in many applications. An ap- 
plication that tries to operate on a large data set in parallel or explore a large search tree will 
quickly overflow the entire J-Machine's queue capacity. One approach to solving this prob- 
lem was explored — if Concurrent Smalltalk code switches to evaluating itself sequentially 
when the local queue size exceeds a threshold, the total queue size on the J-Machine appears 
to remain bounded, although individual queues may still overflow. This approach shows 
some promise for solving the parallelism control problem. 

Finally, a few changes to the MDP architecture were proposed. Allowing direct moves to and 
from ID registers and providing the right fip value after a fault would save instructions in 
many critical Cosmos code sections. 

While Cosmos and Optimist form a workable system as they are now, much fine-tuning re- 
mains to be done. Due to a lack of time, a few features of Concurrent Smalltalk have not 
been fully implemented. The door is now open for experimenting with the difficult problems 
of load management, concurrency control, and garbage collection. These areas have not been 
studied very much in the context of fine-grain parallel computers, and there is room for both 
practical and theoretical results. 
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Optimist II 

Optimist II is a second-generation optimizing compiler for Concurrent Smalltalk, and the 
first to implement nearly the entire revised Concurrent Smalltalk language. Optimist II 
builds upon Optimist by adding an interactive prototyping and debugging environment and a 
few new classes of optimizations. The introduction of global optimizations was especially 
valuable in making Concurrent Smalltalk easy to use efficiently and the runtime system easy 
to write. The greatest advantage of global optimizations is that they permit the programmer 
to divide a system into self-contained abstractions without suffering a performance penalty 
for doing so. There is a trend in modern programming languages towards global optimiza- 
tions 1 , and Optimist II shows that they are both feasible and desirable for a language like 
Concurrent Smalltalk. 

Cosmos 

Cosmos is an optimized operating system for the J-Machine. In addition to performing the 
necessary services to keep the J-Machine running, it includes facilities for function and 
method calls; local and global object allocation, disposal, and migration; method lookup ta- 
bles; distributed object creation and addressing; and various utilities. A few interesting pro- 
gramming techniques were used: an infinite loop broken by a fault is used for block moves 
and sends, and an addressing scheme was developed for distributed objects that allows easy 
addressing of constituents while at the same time distributing them throughout the J-Ma- 
chine and allowing efficient implementation of an operation that returns a nearby con- 
stituent. 

Cosmos was fairly difficult to write due to the constant specter of re-entrancy problems and 
double faults. These errors were the most common problems in JOSS [38]. Nevertheless, 
with the aid of the criticality system those difficulties were overcome. Unfortunately, the ca- 
sualty of this battle with re-entrancy is ease of modification of the Cosmos kernel— the kernel 
is now one compact piece of code. Nevertheless, it should not be necessary to make extensive 
modifications to that kernel in order to add the features mentioned in Chapter 8. 

Debugging 

An important consideration when designing a complicated computer system today is ensuring 
that it is debuggable. The hardware world has been buzzing with ideas such as design-for- 
test for a few years now; yet, these ideas are just as applicable to software. Thus, Cosmos in- 
cludes consistency checks in strategic locations which detect common errors that may be com- 
mitted by Concurrent Smalltalk programs. However, even with those checks debugging a 
Concurrent Smalltalk in assembly language is unpleasant and not as interactive as it could 
be; for this reason, Optimist II includes an interpreter which can be used to get a Concurrent 
Smalltalk program working before it is run on a J-Machine. 

Performance Measurements 

Performance measurements on a simulated J-Machine indicate that the grain size (the num- 
ber of instructions executed in response to a message) averages about 60 instructions. Since 
most functions invocations involve two messages (tail -forwarded invocations being an impor- 



x For example, C++ [37] allows functions to be declared inline, recommending that the compiler inline them in other 
functions. 
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tant exception), the average number of instructions needed to process a function call is about 
100 to 120; the number is lower if many tail -forwarded invocations are made. 

The MDP executes one instruction about every two cycles when running entirely from inter- 
nal memory; when the user program and data are located in external memory, that count 
only doubles to about four cycles per instruction even though the external memory is 5 times 
slower for writing, 3 times slower for reading, and about 24 times slower for fetching instruc- 
tions. The reason for the unusually low cycles-per-instruction number when the user pro- 
gram and data are located in external memory is the high operating system overhead; since 
the operating system is always in internal memory, running operating system code out of in- 
ternal memory tends to pull the cycles-per-instruction number down. 

Under good conditions the MDPs can saturate the network on J-Machines larger than 343 
nodes, although most programs will not execute fast enough for the network to saturate until 
significantly larger J-Machines are used. To prevent network saturation, either the network 
will have to be made faster, the program slower, or some means of exploiting locality in- 
vented. 

Future Work 

Many ideas for future work and research were outlined in Chapter 8. The short-term goals 
are twofold: first, to fill the remaining holes in the implementation of Concurrent Smalltalk; 
in particular, arrays will be useful for running real Concurrent Smalltalk applications; sec- 
ond, to write some nontrivial Concurrent Smalltalk programs and see how well they can uti- 
lize the J-Machine's power. In addition, the load management and parallelism control issues 
in Chapter 8 should be explored. So far development of the Concurrent Smalltalk environ- 
ment has been done without much feedback from applications because until very recently it 
was not possible to run any applications on even a simulated J-Machine. Now that the com- 
piler and the operating system are operative, it will be possible to close the loop and provide 
concrete measures of the J-Machine's performance on real problems. 

Hopes 

Optimist II and Cosmos are but an early step in an evolving base of software for the J-Ma- 
chine. My hopes are that the J-Machine will evolve into a computer competitive with today's 
fastest computers on numerical codes and surpassing them on less-structured but nonethe- 
less computation-intensive Artificial Intelligence applications. 

When I originally wrote this thesis in early 1989, 1 wrote that I was hoping to be able to run 
a Concurrent Smalltalk program on a set of real MDPs. Two years later, during the summer 
of 1991, this wish came true. 
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A.l. Introduction 

Concurrent Smalltalk (CST) is a concurrent descendant of Smalltalk. It is an object-oriented 
programming language developed for multiple instruction/multiple data concurrent comput- 
ers such as the J-Machine. It is an interesting language for a message-passing concurrent 
computer because it encourages locality and disciplines the use of message-passing. 

Goals 

Concurrent Smalltalk is a high-level language intended for general-purpose programming of 
the J-Machine. It was created and revised with the following goals in mind: 

• Expressiveness. Concurrent Smalltalk must be expressive enough to support the paral- 
lel programming paradigms we desire to research on the J-Machine. In particular, it must 
support object-oriented programming and fine-grained parallelism. Also, since a large part of 
the Concurrent Smalltalk runtime system is written in itself, Concurrent Smalltalk must 
support higher-order features such as reasoning about classes of objects. 

• Consistency. Features which would interact destructively with other features were left 
out. For example, become, although a useful Smalltalk-80 construct, would confuse the type 
semantics so it was left out. 

• Simplicity. Concurrent Smalltalk should be as simple as possible. In order to reach the 
goal of simplicity, Concurrent Smalltalk should consist of a few orthogonal concepts. It is 
very important that Concurrent Smalltalk contain no surprises — one should be able to tell 
what a program should do by reading it. Features involving action at a distance (i.e. having 
a statement invisibly affect another statement far away) were intentionally excluded. 

• Familiarity. Programmers familiar with existing languages should be able to carry over 
their experience to Concurrent Smalltalk. Also, corresponding features should act in the 
same ways, which reinforces the "no surprises" philosophy. On the other hand, Concurrent 
Smalltalk is most similar to Smalltalk-80, Common Lisp, and Scheme in this respect. Hence, 
static scoping is used for variables. 

• Efficiency. It is important to be able to compile Concurrent Smalltalk programs into ef- 
ficient machine code. An efficient implementation allows a programmer to concern himself 
primarily with algorithms and implementation rather than performance tuning. Concurrent 
Smalltalk is not a tightly bound low-level language in order to give the compiler latitude in 
optimizing code. 

• Commonality. The sets of built-in classes and methods presented in this language spec- 
ification are by no means minimal. However, the built-in classes are frequently used and 
were included in order to provide a common base for Concurrent Smalltalk programs. The 
inclusion of frequently used classes has three advantages: 

• The built-ins are implemented only once, saving time and effort. 

• The built-ins provide a consistent functional and naming specification. 

• The built-ins can be optimized for efficiency. 
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Format 



BNF 

The syntax of commands is presented in BNF. Literals are presented in bold, while non-ter- 
minals and metasymbols are plain. There are two enhancements to the BNF syntax: 

The {expM | expr2 | ... | exprn} form specifies that each expr can appear at most once, but they 
can appear in any order. The symbol = expr form is a macro used for readability. It specifies 
that whenever symbol appears, it should be replaced by expr before any productions are done. 

Methods and Functions 

The declarations of methods and functions are presented in a syntax similar to that used by 
def method. To give an example, 

(move what:robot x, y, z:integer theta:float) :result Method 

declares a method called move of class robot that takes a receiver argument what of class 
robot, three integer arguments, x, y, and z, and a float argument, theta. That method 
returns an object of class result. 

Sometimes an abstract class like number is declared that has no direct instance objects; in- 
stead, every object of class number is also an object of one of number's subclasses. Methods 
of an abstract class may or may not have definitions for that class. A method that does not 
have a definition for the abstract class is called an abstract method. For example, + is an ab- 
stract method of class number; there exists no generic method to add two arbitrary numbers. 
Instead, when + is called on two numbers, the definition of + for either the class integer or 
the class float is used. Were a third number subclass, complex-number, defined, it would 
have to define its own + method. On the other hand, the zero? method of class number is 
not abstract because it uses the = method (a method defined on all numbers). Thus, com- 
plex-number does not have to define its own zero? method. 

Abstract methods are indicated by the words Abstract Method on the right side of the dec- 
laration line. 

Optional statements are extensions to the basic Concurrent Smalltalk language. They are 
not guaranteed to be present in all implementations of Concurrent Smalltalk, but if an im- 
plementation supports the capabilities described by optional statements, it should use the de- 
scribed syntax. 
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A.2. Syntax 
Tokens 

A Concurrent Smalltalk token is an arbitrarily long string composed of the characters A-Z, a- 
z, 0-9, _, !, ?, %,+, -, *,/,., <, =, >, &, @, and ". The characters !, ?,&, and @ may not be 
used at the beginning of a token, and a token may not be composed entirely of periods ( . ) or 
underscores (_). Also, tokens beginning with an underscore (_) or a percent sign (%) are re- 
served for system purposes and macros and should not be used by user programs. Case is 
not significant. 

A token is considered to be a number if it consists entirely of the characters 0-9, _, +,-,/, . , 
E, or i; it contains at least one digit; it begins with +, -, or a digit; and it does not end with a 
digit. These rules are borrowed from Common Lisp. E introduces an exponent, while I can 
be used for complex numbers if they are implemented. Any token that is not a number is an 
identifier. 

Identifiers 

Concurrent Smalltalk uses static scoping of identifiers. Local identifiers shadow identical 
global identifiers, and the meaning of an identifier can be determined by its location in the 
text of the program. Global identifiers are introduced by the following top-level statements 
and their derivatives: 

• Def constant, to define a constant; 

• Def global, to define a global variable; 

• Def class, to define a class; 

• Def selector, to define a method selector; 

• Define, to define one global identifier in terms of another. 

The syntactic sugar defmethod expands into, among other statements, a def selector, de- 
fining a global identifier. Similarly, def un expands into a def constant statement that de- 
fines the function. 

Except for classes, the above categories share a single name space. Redefining a global iden- 
tifier causes an error or a warning unless the new definition is identical to the old one. Class 
names have lower precedence than other global identifiers, so a global constant can shadow a 
class name. 

All macros are global; however, macros are also in a name space separate from the one 
shared by the above categories. Since macros match patterns instead of just names, two 
macros may share the same name. If more than one pattern is applicable, one is chosen at 
the implementation's discretion. Whenever a macro is applicable, it is expanded, unless one 
of the literals specified in the macro pattern is shadowed by a local identifier. 

Local identifiers are introduced by the following statements and their derivatives: 

• Lambda and def un introduce the names of formal parameters. 

• Method-Lambda and defmethod introduce self, group, names of the instance vari- 
ables, and names of the formals. If a name conflict occurs between the formals, instance 
variables, self, and group, the results are unspecified. 

• Let, clet, mv-clet, andmv-let introduce names of the locals. 

• Lambda, method-lambda, def un, defmethod, block, and loop introduce the names of 
continuations. 

All of the shadowing rules are summarized in Figure A-l. 
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Figure A-l. Scopes of Identifiers 

The scopes of various kinds of identifiers are shown above. Except for macros, sets of identifiers connected by 
thick lines are mutually exclusive and may not contain duplicate names. To find the meaning attributed to an iden- 
tifier, follow the arrows from the bold pattern indicating the identifier's usage to the first box that contains the identi- 
fier. For example, if i is encountered in a program, it is first checked to be a local in the innermost scope, then a 
local in the next innermost scope, and so on until the global scope is reached. If i is not a valid macro pattern, it is 
checked against the globals, parameters, and constants, and finally classes. On the other hand, if # : i is encoun- 
tered, i is checked against the names of classes only. # ' i searches only globals, parameters, and constants, 
both user and predefined. 
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Identifiers that are not defined globally 1 or in any enclosing scope are defined as globals. 
They must be defined before they are used. The exceptions to this rule are identifiers en- 
closed in quote or class statements listed below. 

(global identifier) Primitive 

# • identifier 

Global returns the global identifier identifier, which, if already defined, must be a global (not 
a class). If identifier is not already defined globally, it is defined as a global. 

(class identifier) Primitive 

# : identifier 

Class returns the global class class. Since classes are in a separate name space from other 
globals, no error occurs if there is already a global identifier defined with the same name as 
identifier. 

Symbols 

(quote (nil | true | false | identifier | number | character | string)) Primitive 

'(nil I true | false | identifier | number | character | string) 

Symbols can be specified by preceding including them in a quote form as above, which can 
be abbreviated by a quote mark ( • ). When presented with an identifier, the quote expres- 
sion evaluates to a symbol. Any valid identifier except nil, true, and false can be used — 
symbols cannot be captured by any scope, nor can they be globally redefined. Nil, true, and 
false are treated specially— (quote nil) returns the null object nil, while 
(quote true) returns the boolean true and (quote false) returns the boolean false, 
(quote number), (quote character) , and (quote string) just returns the number number, 
character, or string. 

Constants 

A few constants are predefined. These are listed in Table A-l below. In addition, any num- 
ber can be specified by just including the number. Characters can be specified by preceding 
them with #\. Strings can be specified by enclosing them in double quotes ("). Double 
quotes can be included inside strings by preceding them with \. 

Table A-l. Predefined Constants 



Constant 


Value 


Class 


TRUE 


True 


True 2 


FALSE 


False 


False 3 


NIL 


Nil 


Null 


End-of-file 


An end-of file object 


Object 



x It is up to the implementation to define the moaning of a global definition here. When a file is compiled, an im- 
plementation might choose to read all of the definitions in the file and then compile the code, or it could compile the 
file incrementally. In the latter case forward-referenced identifiers will be considered undefined. 
2 Since true is a global constant, ft : true has to be used to refer to class true. Also, class true is a subclass of class 
boolean. 

3 Since false is a global constant, #: false has to be used to refer to class false. Also, class false is a subclass of 
class boolean. 
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A.3. Programs 

program ::= (top-level | statement)* 

A Concurrent Smalltalk program is a sequence of top-level forms. Additionally, an imple- 
mentation may allow begin and if as top-level forms if test is a constant expression and 
statements in body, consequent, and alternative are top-level forms. Other statements may 
also be allowed at the top level by extended implementations. Statements at the top level are 
executed sequentially as if they were enclosed in a begin. 

Constant Expressions 

Constant expressions are expressions that have to be evaluated at compile time. A constant 
expression can include any expression or function call, except that constant expressions may 
not produce distributed objects as values and may not call functions that use futures. 

Global Definitions 

All constants, parameters, and globals reside in a single name space; in general, redefining 
an identifier with a different meaning causes an error. Macros reside in a separate name 
space and do not conflict with each other or any other global objects (although they may be 
shadowed by local static scoping). 

(def constant name[:type] value) Top-level Primitive 

Def constant defines a constant named name. The constant can be any valid Concurrent 
Smalltalk type. If type is specified, the value must have that type. Once a constant is de- 
fined, it may not be changed (another constant is accepted, though, if it has the same value). 
Constants encountered in methods are replaced by their values at compile time. Value must 
be a constant expression. Predefined constants are listed in Table A-l. 

Language primitives and built-in functions and selectors are defined as global constants. 

(def parameter name[:type] value) Top-level Primitive 

Defparameter defines a parameter named name. The parameter can have any valid Con- 
current Smalltalk type. If type is specified, the value, if present, must have that type. If no 
type is specified, type is assumed to be object, the most general type. Parameters encoun- 
tered in methods are replaced by their values at compile time. Value must be a constant ex- 
pression. Unlike constants, parameters may be redefined at the top level, but their types 
may not be changed. The value of a parameter may not be changed by a running program. 

User functions and selectors defined using def un, defmethod, and def selector are de- 
fined as parameters. Hence, they may be redefined. 

(def global name[:type] [value]) Top-level Primitive 

Def global defines a global named name. The global can have any valid Concurrent 
Smalltalk type. If type is specified, the value, if present, must have that type. If no type is 
specified, type is assumed to be object, the most general type. Value must be a constant ex- 
pression. 

A global may be defined several times, but only the value from the first definition is used. 
Nevertheless, all definitions of a global must have the same type. 
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(define name name) Top-level Primitive 

This primitive defines the first name as an alias for the object specified by the second name. 
For example, if the second name refers to a global, after this primitive is executed, both 
names will refer to the same global. 

(undef name) Top-level Primitive 

This primitive removes the top-level definition of name, if any. It should be used with cau- 
tion, as it is possible to bring the system into an inconsistent state using undef. 
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A.4. Classes 

Built-in Classes 

A few classes are predefined. These are listed in Table A-2, and their hierarchy is shown in 
Figure A-2. The def class primitive can be used to define other classes, which may be based 
on the built-in ones shown in bold in Figure A-2. 

Defining New Classes 

(defciass class {class-declaration} superclasses Top-level Primitive 

instance-var-spec*) 
class ::= name 
superclasses ::= (class+) 

instance-var-spec ::= typed-names | (typed-names {instance-var-declaration}) 
typed-names ::= name (, name)* [: type] 
instance-var-declaration = siniine | &not-iniine | 

& reader names | 

&writer names | scwriter names I 

&cas-er names 
names ::= name | (name*) 

Defciass defines a new Concurrent Smalltalk class. A class is a template for specifying ob- 
jects and methods. Each object belonging to the class contains the instance variables defined 
in the class definition as well as the instance variables inherited from its superclasses, if any. 

In the class definition, class is the new class name. It is followed by an optional declaration, 
described later, the class's superclass list, and finally the additional instance variables de- 
clared by the class. 

Class Inheritance 

Each user-defined class must have at least one superclass, but it may have more than one. A 
class inherits the instance variable and method definitions from its superclasses. It may add 
its own instance variables and methods, and it may attempt to override existing methods. If 
a class is overriding a method, the new method must be a subtype of the existing one. 

A simple form of multiple inheritance is allowed. Two or more superclasses may be specified 
for a class under the following conditions: 

• There must be no instance variable conflicts among the superclasses. Formally, this re- 
quirement is satisfied if and only if out of the superclasses s 1} S2, ... s n provided there is one Sj 
such that if v is an instance variable of Sj, 1<j<n, then V is an instance variable of Sj or one of 
its superclasses. 

• There must be no inherited method conflicts among the superclasses. Formally, this 
means that if selector s is associated with method m| for superclass Si and method nij for Sj, 
then mj and mj are the same method (Textual equivalence of the method code is not enough; 
nrij and mj must "point" to the same method). 

The class then inherits all of the instance variables and all methods from all of its super- 
classes. 
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Instance Variables 

After the superclasses in the class declaration is a list of new or redefined instance variables. 
Instance variables without any type are given the type object. An instance variable may be 
specified to have the same name as an instance variable of one of the superclasses. If so, the 
specified type must be a subtype of the original instance variable's type, and either both or 
neither must be inline. 

An instance variable may be declared sinline or &not-inline. These are hints to the 
compiler that the variable's object should be placed inline or on the heap (not inline). These 
hints only apply if the variable's type is an inline class. The compiler is free to ignore these 
declarations. 

Reader and Writer Methods 

A few methods are automatically defined when a class C is defined. For each instance vari- 
able x of C, two functionally identical methods are defined, named x and get-x, that, when 
called on an object o of class c, return the value of x in o. These methods are called reader 
methods; two are defined in order to avoid name conflicts with instance variables. Similarly, 
a writer method put-x is defined that, when called on an instance object o of C and a new 
value v of x, and assigns v to x in o and returns o. Furthermore, a cwriter method cput-x is 
defined that behaves just like the writer method put-x except that it is not strict — it does 
not necessarily touch its second argument v. Finally, a cas-er method cap-x is defined that 
performs an atomic compare-and-put operation: (cap-x o comparison replacement) checks 
whether the value of instance variable x in o is eq to the value of comparison. If so, it stores 
the value of replacement in x and returns true; otherwise, it returns false. 

If it is desirable to produce reader, writer, cwriter, or cas-er methods with names different 
from the defaults, the Sreader, &writer, Scwriter, and scas-er options can be used to 
specify the new names. More than one method name may be specified for an instance vari- 
able. If Sreader, Swriter, Scwriter, or icas-er is used, the corresponding default 
method name is not defined. For example, if swriter is used with an empty list of names, 
the corresponding writer method name is not defined. 

Class Definition Options 

ClaSS-declaration = fiinline | &not-inline-def ault | 
& immutable | 
^predicate names 

A class definition allows several options which are described in more detail below. 

A class may be declared inline, which means that, whenever possible, objects of that class are 
allocated inside other objects or in local variables instead of on the heap, snot-inline-de- 
f ault is an option for inline classes. 

Objects of an immutable class declared with the simmutable option may be shallow-copied 
at any time at the system's discretion, which can lead to significant performance improve- 
ments. They are also often passed by value to methods and functions. It is not necessary 
that no methods ever write to instance variables, but only that the effects of such writes not 
be visible outside the class data abstraction. The compiler is free to ignore simmutable dec- 
larations. 

The Spredicate option defines the name or names of the class predicate. A class predicate 
is a function that returns true when called on an object of the specified class or its sub- 
classes and false on all other objects. The default name of a class predicate is obtained by 
concatenating a question mark (?) to the end of the class name, so (integer? x) tests 
whether x is an integer. 
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Inline Classes 

When a class is declared & inline, instance objects of that class are often inlined— allocated 
inside other objects or local variables. No method dispatching takes place on inlined objects 
because the compiler knows the exact types of inlined objects — inline class methods are con- 
verted to functions. Declaring a class & inline does not alter its semantics except for a few 
additional restrictions on its usage. The compiler is free to ignore sinline declarations. 

Subclasses of inline classes can be declared under the following restrictions: A subclass of an 
inline class may not declare any additional instance variables, and it may not override any 
methods. The only superclasses allowed for inline classes are classes with no instance vari- 
ables. 

Normally all formals, locals, and instance variables declared with inline classes are inlined 
by default. However, that default can be overridden for individual variables by declaring 
them snot-inline 1 . The default can be overridden for all variables by declaring the class 
&not-inline-def ault, in which case individual variables can be inlined by declaring them 
sinline and giving them the proper type. 

Inline classes are useful for representing small objects such as floats and locks which re- 
quire more than one word but for which ordinary object overhead is prohibitive. In general, it 
is pointless to declare a class inline unless it is immutable or its instance objects are rarely 
passed to methods other than the inline class's. 



1 Another way to override this default is to declare the variable's type as object. 
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A.5. Methods and Functions 

Introduction 

Methods and functions are the basic blocks of computation in Concurrent Smalltalk. Each 
method and function can accept a number of arguments, which are assigned to the formals 
for the duration of the execution of the body of the method or function. Furthermore, a 
method has a special first argument, called the receiver, which contains an object of the 
method's class on which the method was called. In general, methods and functions execute 
concurrently unless explicitly synchronized. This is true even if they are accessing shared 
objects. 

Formals 

formal-spec ::= typed-opt-names | (typed-opt-names {formal-declaration}) 

typed-opt-names ::= opt-name (, opt-name)* [: type] 

opt-name ::= name | _ 

formal-declaration = fivalue | fiinline | &not-inline I &no-leak | Sname name 

A method's or function's formals are listed when the method or function is declared. Each 
name specifies the name of a formal. Typed-opt-names specifies one or more names sepa- 
rated by commas followed by an optional type. The character _ can be used to indicate an 
unnamed formal; unnamed formals accept arguments but cannot be referenced from within 
the method or function. If type is not present, it defaults to object. If the long form of a 
formal-spec is used, the formals in typed-opt-names can be declared using declarations. 

Arguments are passed by value, just as in Smalltalk-80, Scheme, and Common Lisp. The 
types of the arguments to the method or function must be subtypes of the types of the corre- 
sponding formals. A method or a function may assign a value to a formal, which only 
changes the method's or function's local value. Of course, a method or a function is also free 
to mutate a formal using some other method; such changes are visible to the outside. This 
kind of mutation corresponds to communication via shared objects. 

A formal may be declared Svalue, which means that, at the implementation's discretion, the 
method or function may be passed a shallow copy of the argument when it is called. Thus, 
not only is the formal passed by value, but its first-level structure may also be passed by 
value. All formals declared using an & immutable class are automatically declared &value. 
&value declarations are especially useful to improve efficiency of inline classes. 

A formal may also be declared sinline or &not-inline. These are hints to the compiler 
that the formal's object should be placed inline or on the heap. These hints only apply if the 
formal's type is an inline class. The compiler is free to ignore these declarations. 

Declaring a formal &no-leak is a hint to the compiler that the value of the formal is not 
passed out of the method or function, and it will not be referenced after the method or func- 
tion returns. Thus, the implementation is free to perform a shallow deallocate on the value 
of the formal when the function returns. This declaration is especially helpful for arguments 
of type f unct. The compiler is free to ignore this declaration. 

&name can be used to name an anonymous function or method. The name is saved for de- 
bugging purposes. Sname is only allowed in a lambda or a method-lambda. 
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Return Values 

return-specs ::= : type | : : (return-spec*) 

return-spec ::= typed-names | (typed-names {return-declaration}) 

typed-names ::= name (, name)* [: type] 

return-declaration = &vaiue 

A method's or function's return specification may be listed when the method or function is 
declared. Most methods and functions return only one value. For these functions, the short 
form, consisting of a colon ( : ) followed by the return type, is adequate. If the return type is 
object, the entire return specification can be omitted altogether. 

The long form of declaring a method's or function's return types uses the double colon ( : : ) 
notation and allows explicit naming of the return continuation. The name is called a contin- 
uation name. Continuation names are lexically scoped and may be referenced in the body of 
the method or function. The syntax and semantics of continuation declarations are analo- 
gous to those of formals, and the continuation names reside in the same namespace as formal 
and variable names. The only declaration allowed is & value. If the short form is used, a de- 
fault continuation name continuation is used. Some implementations may also allow re- 
turning multiple values. Multiple values do not all have to be returned at the same time, but 
all have to be returned at most once by the time the method or function finishes. 

Since the implicit return statement at the end of a method's or function's body returns its 
value to continuation, it is an error to allow execution to "fall through" the method or 
function to the implicit return statement unless one of the continuations is named contin- 
uation. 

Method and Function Declarations 

funct-declaration = &non-strict | 

(fiinline | Snot -inline) | 
&side-ef feet-free 

The following declarations are allowed for methods and functions: 

• The &non-strict declaration specifies that the arguments do not have to be touched be- 
fore the body of the method or function begins executing. Thus, the method or function may 
at the compiler's discretion receive cfutures in the formals. This declaration is useful mainly 
for inline functions. 

• The sinline and &not-inline declarations specify that the method or function should 
or should not be included inline at the points where it is called. This declaration is only a 
hint, and the compiler does not have to obey it. 

• The &side-ef feet-free declaration is a hint to the compiler that the method or func- 
tion does not perform any visible side effects on its arguments or on the global environment. 
This information lets the compiler better schedule calls to the method or function. This di- 
rective is also useful on methods and functions that dfl perform side effects; it tells the com- 
piler that those side effects are not essential. One example of a method that falls into this 
category is a method operating on an immutable class of complex numbers that allows redun- 
dant representations in rectangular or polar form. The method could side effect a complex 
number to calculate its polar representation from its rectangular one, but that side effect is 
not essential for the program to work correctly. 

The Calling Process 

When a function or a method is called, the values of the arguments are computed and as- 
signed to the formals. The formals are touched unless the function is declared 
&non- strict. After all formal values are evaluated, execution of the method's expressions 
proceeds as if the expressions were enclosed in an implicit block — initially the first expres- 
sion is evaluated, then the second one, and so forth. The value of the implicit block, which is 
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the value of the last expression, is returned to the caller unless an exit or return state- 
ment is encountered first. 

Scoping of Local Variables 

Local variables are statically scoped. Any lambda, method-lambda, future, or lazy-fu- 
ture created within a method or a function is a full closure and may reference and alter the 
method's or function's local variables. Similarly, the method or function may alter its locals, 
and such changes will be visible to any lambda, method-lambda, future, or lazy-future 
nested within it. 

If concurrency and efficiency are desired, however, such sharing should be avoided whenever 
possible. A lambda, method-lambda, future, or lazy-future should declare its own tem- 
poraries for local computations instead of using ones belonging to an outer static scope. If a 
method or function wants to pass values into a closure, it should initialize the appropriate 
temporaries before the closure is created and not change those temporaries afterwards. The 
closure should not change those temporaries either, unless it wants to pass a result back to 
the method or function that created it. 

Functions 

(lambda (formal-spec*) [return-specs] {funct-declaration} Primitive 

body) 

Lambda defines and returns an anonymous function. Formal-spec* is a list of the function's 
formals and their types. Return-specs specifies the function's return type, or, if it returns 
multiple values, the number of return values and their types. The function may also have 
declarations, as explained above. Body is a list of statements that form the body of the func- 
tion. 

(defun name (formal-spec*) [return-specs] {funct-declaration} Top-level Macro 

body) 

Defun defines a global function with name name, formals as specified in formal-spec*, return 
values defined by return-specs, optional declarations funct-declaration, and body body. 

Methods 

(method-lambda class (formal-spec*) [return-specs] {funct-declaration} Macro 

body) 

Method-lambda returns a method of class class. The resulting method does not have a se- 
lector. Nevertheless, it can be called as a function if the first argument is an instance object 
of class. The other parameters are as in lambda. 

Method-lambda also introduces into the scope of body the names of the instance variables of 
an object of class class as well as two special variables: self and group. Self refers to the 
first argument of the method call, also known as the receiver object. If class is a subclass of 
distobj, group refers to the group name of the distributed object of which self is a con- 
stituent. 

(defselector selector) Top-level Primitive 

selector ::= name 

Defselector defines name as a selector. This primitive is rarely used explicitly, as all un- 
defined names are assumed to be selectors by default. 
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(add-method selector class value) Top-level Primitive 

Add-method associates a method with its class and selector. When selector is called with a 
receiver object that belongs to Class, value is called. Value should be a function or a method. 

(method selector class) Primitive 

Method performs the inverse of the add-method operation — it returns the method associ- 
ated with selector and class. If no method is associated with selector and class, method re- 
turns nil. 

(defmethod selector class (formal-spec*) [return-specs] Top-level Macro 

{funct-declaration} 
body) 

Defmethod defines a global method with the given selector and class. The rest of the syntax 
is analogous to def un. 

When a method is called, the values of the selector and arguments are computed, and the 
method associated with the selector and the class of the receiver object is found. Of course, 
this method may have been defined for a superclass of the class of the receiver object (i.e. it 
may be inherited). It is an error occurs if no such method exists. Otherwise, the process of 
calling a method is the same as that of calling a function. 
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A.6. Statements 

value ::= statement 
expression ::= statement 
body ::= statement* 

In the definitions below, the non-terminals value, expression, and statement all refer to 
statements, although value usually denotes a side-effect-free statement that is executed for 
its return value, expression denotes a statement that may have side effects but is executed 
mainly for its return value, and statement denotes a statement that is executed mainly for its 
side effects. A body is a sequence of statements executed one after another just like in be- 
gin; the value of a body is the value of its last statement. 

Futures and CFutures 

Futures and cfutures (context futures) are the main means of achieving concurrency in Con- 
current Smalltalk. Both futures and cfutures are promises to produce some value at a later 
time. Forcing a future means forcing the future to fulfill its promise and return its value. 
Analogously, touching a cfuture forces it to calculate and return its value. A force implies a 
touch, so a force never returns a cfuture. 

There are two main differences between futures and cfutures. These are outlined below: 

• Futures are guaranteed not to be forced unless they are explicitly forced, while cfutures 
are not guaranteed not to be touched — they may be touched at any time at the compiler's and 
operating system's discretion. In an extreme case, cfutures may be touched as soon as they 
are created, leading to a sequential implementation of Concurrent Smalltalk (except for fu- 
tures). 

• CFutures are generated by almost all primitive operations, while futures are generated 
only by the future and lazy-future primitives and their derivatives. 

• CFutures are always eager — if left alone, they will tend to evaluate to their values. Nor- 
mal futures, on the other hand, may be eager or lazy. A lazy future may not begin to evalu- 
ate to its value until it is forced; if it is never forced, it may never be evaluated. 

The rationale behind creating two kinds of futures is to allow the use of cfutures for most 
tasks where parallelism is desirable but guaranteed parallelism is not necessary for the cor- 
rect operation of the program. CFutures are intended to be very cheap — they can be created 
and touched in a few assembly language instructions. Futures, on the other hand, are re- 
served for the cases like normal-order evaluation where the semantics of delayed evaluation 
are necessary for the program to run correctly. Futures are much more expensive than cfu- 
tures in terms of space and time. 

Both futures and cfutures may have values of complicated expressions as their promises. For 
example, if (f 3)=30, (g 7)=49,and(h 30 4 9) =7 9, during the execution of the state- 
ment 

(cset a (h (f 3) (g 7)>) 

a may be computed in arbitrary order, and f and g need not have returned values by the 
time the next statement is executed. If a is later touched, it will assume the value 7 9. 

The semantics become more complicated if the functions f, g, and h have side effects. The 
order of evaluation of arguments of function calls is undefined and may be parallel, so f and 
g may be evaluated in parallel. Furthermore, if h is declared &non-strict (as many built- 
ins are), the evaluation of h may overlap with the evaluation of its arguments. If, say, h does 
not use the value of its second argument until late in its execution, h may already be execut- 
ing while (g 7 ) is still being calculated. Finally, if h can return without ever requesting the 
value of its second argument, (g 7) may never be completely evaluated (since cfutures are 
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eager, it will keep evaluating, but the entire program may finish before it is done). A good 
example of this phenomenon is (and b false) , where the program can proceed without 
ever determining the value of b. 

Argument Evaluation 

Unless a method or a function is declared &non-strict, method and function calls are strict 
with respect to cfutures but not futures — the arguments of a method or function are guaran- 
teed not to be cfutures when the method or function begins evaluation. For example, assum- 
ing no futures are used, in 

(cset a (h (f 3) (g 7))) 

(cset b (k 10) ) 

(touch a) 

(cset c (1 10)) 
(f 3) and (g 7) are guaranteed to be done evaluating before (h 30 49) begins evaluating. Also, 
(f 3), (g 7), and (h 30 49) are guaranteed to be done evaluating before the evaluation of (1 10) 
is started. However, (k 10) can be evaluated concurrently with any of (f 3), (g 7), (h 30 49), or 
(1 10). 

The arguments of functions are evaluated concurrently. This means they may be evaluated 
sequentially, in parallel, or any combination of the two. Using side effects can sometimes 
lead to deadlock. For example, suppose that the function release-lock releases a global 
lock and acqui re-lock waits until the lock is released and then acquires it. Further, sup- 
pose that global-lock is originally acquired. Then, the expression 

(h (release-lock global-lock) (acquire-lock global-lock)) 
can lead to deadlock because the implementation might choose to evaluate acquire-lock 
sequentially before release-lock. 

Concurrent evaluation order is also distinct from an arbitrary sequential order. For example, 
suppose that c is a local variable with an initial value of and consider the value of the ex- 
pression 

(cset a (+ (cset c (+ 1 c) ) (cset c (+ 1 c) ) ) ) 

(touch a) 

(touch c) 
Under sequential evaluation of arguments, the final value of a would always be 3 and the fi- 
nal value of c would always be 2 when this expression completes. Under concurrent evalua- 
tion of arguments, the final value of c could be 1 if, say, both increments were done before ei- 
ther assignment to c. In this case, a would get the value 2. 

(touch expression) Primitive 

(touch expression*) Macro 

If expression is not a cfuture, touch does nothing. Otherwise, touch waits until the value of 
the cfuture is available and then returns that value. It should be kept in mind that if touch 
is used in a subexpression, other subexpressions may or may not continue evaluating while 
this touch is waiting. Also, a touch in a subexpression does not guarantee that the entire 
expression will not yield a cfuture, as is demonstrated in one of the examples above. 

If more than one expression is specified, touch touches them all and returns the value of the 
last one. If no expressions are specified, touch returns nil. 

Touch does not have any effect on futures. 
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(force expression) Primitive 

! expression Macro 

(force expression*) Macro 

If expression is not a future or a cfuture, force does nothing. Otherwise, force waits until 
the value of the future or cfuture is available and then returns that value. That value is 
guaranteed not to be a future or a cfuture. 

If more than one expression is specified, force forces them all and returns the value of the 
last one. If no expressions are specified, force returns nil. The ! expression form is a 
shorthand for (force expression). 

(future expression) Primitive 

(lazy-future expression) Primitive 

Future and lazy-future both return futures that promise to evaluate expression when 
forced. The futures are guaranteed to evaluate in parallel with all other processes unless 
explicitly synchronized. Future and lazy-future differ in that future begins evaluating 
its expression immediately, while lazy-future waits until it is forced before it starts 
evaluating its expression In any case, expression is evaluated at most once, no matter how 
many times it is forced. 

Caveats: The actual time when a future is forced is sometimes rather fuzzy, especially in 
the presence of inlined primitives and side-effect-free functions, so the guarantee in the pre- 
vious paragraph may not apply in the code just before a future is forced (the extent of this 
fuzzy section of code is still to be determined). Also, futures should not return objects of 
classes that can be inlined — doing this may force the future immediately at any point. These 
caveats should not present problems unless futures have intricate side effect dependencies. 

Application Statement 

(funct arg*) Primitive 

funct ::= expression 
arg ::= expression 

The first item of an application statement is either a method selector or a function. If it is a 
selector, the method corresponding to the selector and the class of the first argument is called 
using the arguments provided. If it is a function, it is applied to the specified arguments. 
The first item can also be any expression that evaluates to an object of type funct. The 
value of the application statement is either the return value or a cfuture promising that 
value. 

The order of evaluation of arguments is not specified; in fact, some of them may be (but are 
not guaranteed to be) evaluated concurrently. The arguments are not guaranteed to be 
touched before being passed to the funct — some of them may be passed to the funct as futures 
or even cfutures (However, all user-defined methods and functions not explicitly declared 
: non-strict will touch their arguments before their code begins executing). For example, 
(cset a (+ a) ) does not touch a, and (and b false) does not touch b. 

Type Assertion 

(:type expression) Primitive 

The type assertion statement asserts that the type of expression's value is a subtype of type. 
It returns expression's value. The compiler is not required to generate an error if expression 
evaluates to a value that is not a subtype of type, but it may do so. 
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Variable Bindings 

(clet (binding-spec*) body) Primitive 

binding-spec ::= typed-opt-names | (typed-opt-names {variable-declaration} [value]) 

typed-opt-names ::= opt-name (, opt-name)* [: type] 

opt-name ::= name | _ 

variable-declaration e fiinline | &not-inline 

Clet creates local variable bindings and evaluates body within the scopes of those bindings. 
Each name specifies the name of a new variable. Typed-opt-names specifies one or more 
names separated by commas followed by an optional type. The character _ can be used to in- 
dicate an unnamed local variable; unnamed local variables can be used to evaluate the initial 
value expression without binding a name in the static scope. If type is not present, it de- 
faults to object. If the long form of a binding-spec is used, the variables in typed-opt-names 
can be declared using declarations and can be given an initial value. Value, the initial value 
is an expression evaluated outside the scope of the clet. Each initial value is evaluated only 
once, even if it is assigned to more than one variable. The new variables are bound concur- 
rently. Their initial values may be evaluated concurrently, and they are not guaranteed to 
be touched by the time body begins executing— in body the new variables may still contain 
cfutures. 

A variable may be declared &inline or &not-inline. These are hints to the compiler that 
the variable's object should be placed inline or on the heap. These hints only apply if the 
variable's type is an inline class. The compiler is free to ignore these declarations. 

The value returned by a clet is the value returned by the last statement in body. 

(let (binding-spec*) body) Macro 

Let is the same as clet except that all newly-bound variables are touched before body be- 
gins executing. As with clet, the initial values are evaluated concurrently. 

(cset name expression) Primitive 

Cset sets the variable name to expression. The variable gets either the touched value of ex- 
pression or a cfuture promising to evaluate expression. The value returned by a cset is the 
value of expression. 

(set name expression) Macro 

Set sets the variable name to the value of expression. The value is touched before it is as- 
signed to the variable, so the variable will not contain a cfuture or a future after this state- 
ment. The value returned by a set is the touched value of expression. 

(cas name comparison replacement) Primitive 

comparison ::= expression 
replacement ::= expression 

CAS (compare-and-set) is an atomic 1 operation that checks whether the value of variable 
name is eq to the value of comparison. If so, the value of replacement is stored in variable 
name and cas returns true; otherwise, cas returns false. The value of variable name is 
never a cfuture when cas completes. 



x In the current implementation, in order for cas to be atomic, neither name nor replacement can be a future. If 
replacement could be a future, it should be forced before a cas is done. There is no easy solution if name could be a 
future. Fortunately, there is usually little reason to store a future in a semaphore. 
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Multiple Values 

The constructs below are used for receiving multiple values from methods and functions. 
Multiple values may not be supported by all implementations of Concurrent Smalltalk. 

(mv-cset (name*) (funct arg*)) Optional Primitive 

MV-cset sets the variables name* to the multiple values returned by (funct arg*) . The vari- 
ables get either the touched return values of expression or cfutures promising to evaluate 
them. Some of the return values may be available before others. Mv-cset returns nil. 

(mv-set (name*) (funct arg*)) Optional Macro 

MV-set is just like mv-cset except that it touches all variables in name* before continuing. 

(mv-clet (mv-binding-spec*) (funct arg*) body) optional Macro 

mv-binding-spec ::= typed-opt-names | (typed-opt-names {variable-declaration}) 

typed-opt-names ::= opt-name (, opt-name)* [: type] 

opt-name ::= name | _ 

variable-declaration = siniine | &not-iniine 

(mv-let (mv-binding-spec*) (funct arg*) body) Optional Macro 

MV-clet and mv-let are just like clet and let except that they initialize the new vari- 
ables to the values returned by (funct arg* ) . 

Syntactic Sugar 

[arg*] Macro 

This form is equivalent to (get arg*) . 

(cset (funct arg*) expression) Macro 

When the first argument of a cset is a function or a method call, cset is desugared into an- 
other function or a method call. The above forms are converted to (funct' arg* expression), 
where the identifier funct' is obtained by appending the characters cput- to the beginning of 
the identifier funct, unless: 

funct is get, in which case funct' is cput; 

funct is get-x, in which case funct' is cput-x (x is any sequence of characters); 

funct is put, put-x, cput, cput-x, cap, or cap-x, in which case an error occurs. 
Funct must be a function name or a method selector. It may not be an expression or a vari- 
able reference, (funct arg*) may, however, be a macro or contain macros; these macros are 
expanded before the above conversion takes place. 

For example, (cset (first sequence) 3) is converted to (cput-first sequence 3), 
while (cset [big-array 7] 12) is converted to (cput big-array 7 12). 

(set (funct arg*) expression) Macro 

When the first argument of a set is a function or a method call, set is desugared into an- 
other function or a method call. The above forms are converted to (funct' arg* expression), 
where the identifier funct' is obtained by appending the characters put- to the beginning of 
the identifier funct, unless: 

funct is get, in which case funct' is put; 

funct is get-x, in which case funct' is put-x (x is any sequence of characters); 

funct is put, put-x, cput, cput-x, cap, or cap-x, in which case an error occurs. 
Funct must be a function name or a method selector. It may not be an expression or a vari- 
able reference, (funct arg*) may, however, be a macro or contain macros; these macros are 
expanded before the above conversion takes place. 
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For example, (set (first sequence) 3) is converted to (put-first sequence 3), 
while (set [big-array 7] 12) is converted to (put big-array 7 12). 

(cas (funct arg*) comparison replacement) Macro 

When the first argument of a cas is a function or a method call, cas is desugared into an- 
other function or a method call. The above form is converted to 
(funct' arg* comparison replacement) , where the identifier funct' is obtained by appending 
the characters cap- (compare-and-put) to the beginning of the identifier funct, unless: 

funct is get, in which case funct' is cap; 

funct is get-x, in which case funct' is cap-x (x is any sequence of characters); 

funct is put, put-x, cput, cput-x, cap, or cap-x, in which case an error occurs. 
Funct must be a function name or a method selector. It may not be an expression or a vari- 
able reference, (funct arg*) may, however, be a macro or contain macros; these macros are 
expanded before the above conversion takes place. 

Flow of Control 

(begin body) Primitive 

Begin evaluates the statements in body sequentially, touching each one except the last be- 
fore it begins the next, and returns the untouched value returned by the last one. If there 
are no statements in body, begin returns nil. 

(nconcurrently Statement*) Macro 

(concurrently Statement*) Macro 

?statement Macro 

These macros evaluate the statements in statement* concurrently and return nil. 
Concurrently waits until all statements have finished executing before returning, while 
nconcurrently does not. ?Statement is an abbreviation for (nconcurrently Statement) . 

(nparallel Statement*) Macro 

(parallel Statement*) Macro 

These macros evaluate the statements in statement* in parallel and return nil. Parallel 
waits until all statements have finished executing before returning, while nparallel does 
not. The parallelism is guaranteed, which makes parallel a much more expensive state- 
ment than concurrently. In most cases concurrently should be used instead unless par- 
allel semantics are explicitly required. 

(if test consequent [alternative]) Primitive 

test ::= expression 
consequent ::= expression 
alternative ::= expression 

if evaluates the test expression, which must return either true or false. If it returns 
true, the consequent expression is evaluated and its value returned; otherwise, the alterna- 
tive expression, if any, is evaluated and its value returned, if is not guaranteed to touch the 
test value. However, it is guaranteed to evaluate only the appropriate arm of the condi- 
tional. 

Loops 

(while test body) Macro 

test ::= expression 

While evaluates the test expression, which must return either true or false. As long as it 
returns true, body is evaluated and test reevaluated. When test evaluates to false, while 
returns nil. 
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(repeat body until test) Macro 

test ::= expression 

Repeat first evaluates body and then the test expression, which must return either true or 
false. As long as it returns false, repeat goes back to evaluating body. When test evalu- 
ates to true, repeat returns nil. 

Primitive Control 

(block continuation body) Macro 

Block is just like begin except that it allows the use of return and reply statements to 
leave it. The statements in body are evaluated as in a begin. Continuation specifies the 
block's continuation for use in return and reply statements. 

(loop continuation body) Macro 

Loop defines a loop body. The statements in body are evaluated as in a begin, except that 
after the last statement in body has been evaluated, the first statement is evaluated again, 
and so on. The loop does not terminate unless an explicit return or reply statement is en- 
countered. Continuation specifies the loop's continuation for use in return and reply state- 
ments. 

Returning Values 

Since the last expression in the method code is implicitly returned to continuation, the 
statements below are necessary only if it is desired to return a value from the middle of a 
method or function, if a block or loop should be terminated, if multiple values are being re- 
turned, or if a value is returned to a continuation with a name other than continuation. 
Reply and exit should be used with caution, as exit may cause the caller to hang, while 
reply may cause the caller to crash if two replies are inadvertently sent. Care must be taken 
to reply to each continuation at most once — sending a second reply to a continuation will al- 
most certainly cause a system crash, and it is quite difficult to protect the system against 
this type of error. When using reply it is important to remember that there is an implicit 
reply of the last expression in the method code to continuation. 

Continuations 

Continuations are introduced by lambda, method-lambda, defun, defmethod, block, 
loop, future, lazy-future, parallel, and nparallel. The continuations defined by 
future, lazy-future, parallel, and nparallel are not externally accessible. Lambda, 
method-lambda, defun, and defmethod define the default continuation continuation un- 
less told otherwise. They also reply to continuation if allowed to complete executing with- 
out an intervening exit. Thus, care must be taken when using nested function and method 
definitions to make sure that reply and return reply to the right continuation. 

Continuation manipulation can become quite complicated, and not all features have to be 
supported by all implementations. A minimal implementation only has to allow replying to 
the innermost construct that defines continuations; hence, an implementation may restrict 
non-local replies. Furthermore, an implementation does not have to support replying out of a 
future, lazy-future, parallel, or nparallel statement, since these also introduce con- 
tinuations. A more sophisticated implementation may allow replies to all continuations ac- 
cessible in the current lexical scope. Finally, an advanced implementation may choose to 
make continuations first-class values of class # : continuation and allow them to be stored 
in variables. 

(exit) Primitive 

Exit is a statement that hangs, never returning a value. In most cases exit can be thought 
of as exiting the current method or function, but it does not necessarily do so if used in a 
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cset, concurrently, nconcurrently, parallel, nparallel, block, loop, future, or 
lazy-future statement, let or clet bindings, or some other statement that permits paral- 
lel execution without synchronization. 

(reply expression) Macro 

(reply (continuation expression)*) Primitive 

The first variant of reply evaluates expression and sends its value to continuation. Exe- 
cution then proceeds with the next statement of the current method, if any. Reply is not 
strict — it may reply a future or a cfuture. The value of a reply statement is nil. 

The second variant of reply is used to return values to named continuations. The reply 
takes an even number of arguments; within each pair, the first argument is the continuation 
name and the second one its value. 

(return expression) Macro 

(return (continuation expression)*) Macro 

Return is equivalent to a reply followed by an exit — the values of the expressions are sent 
to the caller, and the execution of the method or function terminates subject to the caveats in 
the exit statement description. 

(return-value-expected?) :bOOlean Function 

(return -value-expected? continuation) :boolean Function 

Return-value-expected? returns true if the caller of the method or function is expecting 
a reply for continuation (or continuation if continuation is not specified). It is not guaran- 
teed to return false otherwise, so an implementation that always returns true is accept- 
able. 
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A.7. Built-in Methods and Functions 



Built-in Classes 

Built-in classes are provided for reasons of efficiency and convenience. Many methods on 
built-in classes are compiled into single assembly language instructions instead of method 
calls, improving their speed greatly. Other built-in classes may be defined by methods writ- 
ten in assembly language and linked with the programs generated by the compiler. Arrays 
may be defined this way. The built-in classes are listed in Table A-2, and their hierarchy is 
shown in Figure A-2. 



Table A-2. Built-in Classes 



Class 



Metaclass 



Values 



Null* 

Symbol 

True* 

False 1, 

Boolean 

Character 

Small-Integer 

Large-Integer 

Integer 

Float 

Real 

Number 

Magnitude 

Primitive-Class 

Standard-Class 

Distributed-Class 

Class* 

Function 

Funct 

System-stream 

Stream 

Simple-Lock 

Queueing-Lock 

Lock 

Integer-Array 

String 

Boolean-Array 

Simple-Array 

Array 

Collection 

Distobj 

Object 



Primitive-Class 

Primitive-Class 

Primitive-Class 

Primitive-Class 

Primitive-Class 

Primitive-Class 

Primitive-Class 

Primitive -Class 

Primitive -Class 

Primitive-Class 

Standard-Class 

Standard-Class 

Standard-Class 

Primitive-Class 

Primitive -CI ass 

Primitive-Class 

Primitive-Class 

Primitive-Class 

Primitive-Class 

Primitive-Class 

Standard-Class 

Primitive-Class 

Primitive-Class 

Standard-Class 

Primitive-Class 

Primitive -Class 

Primitive-Class 

Primitive-Class 

Standard-Class 

Standard-Class 

Distributed-Class 

Standard-Class 



Nil 

Symbols, including nil, but not true and false 

The boolean t rue 

The boolean false 

The booleans true and false 

ASCII characters 

Integers representable in a machine word 1 

Integers not representable as Small -integers 

Arbitrary-sized integers 

Floating-point numbers 2 

Real numbers 

Arbitrary numbers 

Numbers, characters, and booleans 

Primitive classes defined by Concurrent Smalltalk 

Standard (non-distributed) classes 

Distributed classes 

General classes 

Functions, methods, and closures 

Functions, methods, closures, and method selectors 

System-defined streams 

Sources of input or destinations for output 

Very cheap and simple locks 

More expensive locks that queue pending tasks 

General locks 

Small arrays of integers 

Small arrays of characters 

Small arrays of booleans 

Small arrays of arbitrary objects 

Arrays of arbitrary objects 

Indexed collections of objects 

All distributed objects 

All first-class values 



The metaclass of a class is the class of the class object itself. Metaclasses govern certain 
aspects of class behavior such as inheritance and the action of new. Only classes having 
standard-class or distributed-class as a metaclass permit user-defined subclasses. 
At the implementation's discretion some classes with primitive-class as a metaclass may 



^This class name conflicts with another global name, so it has to be preceded with t : whenever it is used. 
Currently a machine word is 32 bits, so the small -integer range is -2147483648 to 2147483647. 



152 



Appendix A 



Concurrent Smalltalk Reference 



f Symbol- 

,„ . . . ^j^-Character 
(Magnitude *sT 




Object* 




False 
True 



Numbei 



Real 
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Large-Integer 
Float 
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Standard-Class 
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Selector 
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Array ■ 



^Integer-Array 
S imple -Arra y^ ^ St ring 

Boolean-Array 



Figure A-2. Hierarchy of built-in classes 

The superclasses are shown to the left of their subclasses. All classes are subclasses of object. Classes 
with metaclass primitive-class are shown in bold, classes with metaclass standard-class are shown in 
standard type, and the one class with metaclass distributed-class is shown in italic. User-defined classes 
may be defined as subclasses of any of the classes having standard-class or distributed-class as a 
metaclass. 

actually be instances of standard-class or distributed-class, but portable programs 
should not rely on this. 

Built-in Methods 

Built-in methods are provided for the basic arithmetic and logical operations. The methods 
are explained in the following sections. Since some built-in method calls compile into assem- 
bly language instructions, some restrictions are necessary on the use of their selectors. 
Specifically, if any other methods are defined using the selectors in Table A-3, they must 
obey the identities listed in Table A-4. 

Redefining Restricted Selectors 

If a restricted selector is called with an argument that is not one of the built-in classes it rec- 
ognizes, the actual method for the class is found and executed, possibly after some of the iden- 
tities in Table A-4 have been applied. Thus, it is possible to define a class of type, say, com- 
plex, and define a method * for numbers of that type. That method will be called whenever 
* is used on a number of type complex, regardless of whether that number is the first or sec- 
ond argument. If both complex numbers and quaternions are defined, the complex * method 
should be prepared to handle a quaternion as the second argument, while the quaternion * 
method should be prepared to handle a complex number as the second argument. The re- 
verse methods have been added to handle the case of a non-built-in object being the second 
argument of a noncommutative operation. The <>, <=, and >= methods should never be rede- 
fined, as they are never called. Redefine =, >, and < instead. 



2 Floating point numbers may not be implemented in all Concurrent Smalltalk implementations. 
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The associative restricted selectors allow an arbitrary number of arguments; they compile 
into pairwise invocations of the corresponding methods. The grouping order is not specified. 

Methods declared with restricted selectors should not have side effects. 

The identities in Table A-3 have been carefully selected to allow efficient implementation of 
primitive operations without sacrificing functionality. Some identities have been omitted on 
purpose. For example, * does not have to be commutative in general, nor does ( * a ) have 
to equal 0. Not requiring these identities allows * to be used to multiply quaternions and 
matrices. 

The restricted selectors not, and, or, and xor may not be distinguishable from lognot, lo- 
gand, logor, and logxor on all implementations. Redefining these should be avoided; if 
they must be redefined, only one set should be redefined. 

Table A-3. Restricted Selectors 

not and or xor lognot logand logor logxor 

neg + - reverse — * // reverse-// mod reverse-mod 
ash reverse-ash integer-length 
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Table A-4. Identities among Primitive Methods 

+ is associative and commutative. 

is an identity for +. 
(- a b) = (reverse — b a). 
(- a b) = (+ a (neg b) ). 
(neg (neg a) ) = a. 
* is commutative with scalar constants and associative. 

1 is an identity for *. 
(* a -1) = (neg a). 

2). 

b)). 



b)). 



(* a 


2 e ) = 


(ash a e). 




(// a (neg 


b) ) = (neg 


(// a b 


(// a 2 e ) = 


(ash a -e). 




(// a b) = 


(reverse-// 


b a). 


(mod 


a (neg 


b) ) = (neg 


(mod a 


(mod 


a b) = 


(reverse-mod b a) 


(ash 


a b) = 


(reverse-a 


sh b a) 


(ash 


a) = 


0. 




(ash 


a 0) = 


a. 




(not 


(not a 


)) = a. 





and, or, and xor are associative and commutative. 

(and a false) = false. 

(and a true) = a. 

(or a false) = a. 

(or a true) = true. 

(xor a false) = a. 

(xor a true) = (not a). 

(lognot (lognot a) ) = a. 

logand, logor, and logxor are associative and commutative. 

(logand a 0) = 0. 



a). 



(logand a 


-1) 


= a. 




(logor a 


0) = 


a. 




(logor a 


-1) = 


-1. 




(logxor a 


0) = 


a. 




(logxor a 


-1) 


= (lognot 


(< a b) = 


(not 


(>= a 


b)) 


(> a b) = 


(not 


(<= a 


b)) 


(= a b) = 


(not 


(<> a 


b)) 


(< a b) = 


(> b 


a). 




(<= a b) = 


■ (>= 


b a). 




(= a b) = 


(= b 


a). 




(<> a b) = 


■ «> 


b a). 
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A.8. System and Object Operations 

Objects 

(new c:standard-class) :object Method 

New, when applied to a standard class, creates and returns a new instance object of the speci- 
fied class. The object is not initialized. Some implementations may restrict the new argu- 
ment to be a constant expression. 

Copiers 

(deep-copy o:object) :object Method 

Deep-copy returns a copy of the object. Any of the object's instance variables are also recur- 
sively copied using deep-copy. If the class of the object is immutable, deep-copy may just 
return the object it received. Deep-copy may fail to terminate on circular object references. 

(shallow-copy oiobject) :object Method 

Shallow-copy returns a copy of the object without copying any of the object's instance vari- 
ables. If the class of the object is immutable, shallow-copy may just return the object it re- 
ceived. 

(copy o:object) :object Method 

Copy is the most appropriate copying routine for a given object. It defaults to shallow- 
copy. 

Deallocators 

In addition to waiting for garbage collection, the following methods can be used to explicitly 
deallocate the storage for an object. Accessing an object after it has been deallocated causes 
an error. 

(deep-dispose orobject) :null Method 

Deep-dispose deallocates the object's storage. Any of the object's instance variables are 
also recursively disposed using deep-dispose. Deep-dispose should not be used on circu- 
lar or multiple object references. 

(shallow-dispose o:object) :null Method 

Shallow-dispose deallocates the object's storage without disposing any of the object's in- 
stance variables. 

(dispose o:object) :null Method 

Dispose is the most appropriate deallocating routine for a given object. It defaults to shal- 
low-dispose. 

Class Inquiries 

(class-of o:object) :class Method 

Class-of , when applied to an object, returns its class. 
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(class-kind? o:object c:class) :boolean Method 

(class-member? o:object c:class) :boolean Method 

Class-kind? returns a boolean value that specifies whether the given object is an instance 
of the given class or one of its subclasses. Class-member? is just like Class-kind? except 
that it returns true only if the object is a direct instance of the given class. 

(subclass? cl, c2:class):boolean Method 

Subclass? returns true if cl is a subclass of c2 and false otherwise. 



157 



Concurrent Smalltalk on the Message-Driven Processor 



A.9. Distributed Objects 

distobj Class 

Distob j is the distributed object class. 

Group and Constituents 

A distributed object consists of a group name and one or more constituent objects. The con- 
stituent objects act just like normal objects except that they inherit methods and instance 
variables from the class distobj and they respond to the group and get -group messages. 
A group name indicates the entire collection of distributed objects. When a method is called 
on the group name, it is processed by one of the distobj's constituent objects, as though the 
method were called on that constituent object. The identity of the constituent object receiv- 
ing the message is left unspecified; implementations are encouraged to heuristically pick dif- 
ferent constituent objects for different calls to the group, thereby facilitating concurrency for 
distributed object operations. When a constituent object is processing a method, self is the 
constituent object, not the group name. 

Creation 

(new c:distributed-class n-constituents:integer) rdistobj Method 

New, when applied to a distributed class, creates and returns a new distributed object of the 
specified class with the given logical number of constituents. The constituents are not initial- 
ized. 

The distributed object that is created may contain more constituents than n-constituents. The 
runtime system determines an appropriate physical number of constituents for the dis- 
tributed object that is at least as large as n-constituents. The additional constituents should 
be prepared to respond to messages sent to the distributed object. 

Operations 

(co o:distobj n:integer) rdistobj Method 

Co returns the nth constituent object of the distributed object, o can be either the group ob- 
ject or any of its constituents. N must be between and the physical number of constituent 
objects in the distobj minus one. 

(logical -limit o:distobj) :integer Method 

Logical-limit is the logical number of constituent objects in the distributed object. 

(physical -limit o:distobj) :integer Method 

Physical-limit is the physical number of constituent objects in the distributed object. The 
constituent objects are numbered between and physical-limit minus one, inclusive. 
Physical-limit is never less than logical-limit. 

(index o:distobj) :integer Method 

Index is the number of a particular constituent object in a distributed object. Index ranges 
between and physical-limit minus one, inclusive. 
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group:distobj Instance Variable 

(group o:dist0bj) :dist0bj Method 

(get-group o:distobj) :distobj 

Group is the inverse of co — it returns the group object of the given distributed object. O can 
be either the group object or any of its constituents; if o is already a group object, group just 
returns it. Get -group is functionally equivalent to group; it is provided to avoid name con- 
flicts with the group variable inside distributed object methods. 
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A.10. Logical and Arithmetic Operations 

Comparisons 

(eq oi, o2:object):boolean Function 

(neq oi, o2:object) :boolean Function 

Eq returns true if the two objects are indistinguishable — there is no legal way of distin- 
guishing ol from o2. For mutable objects this means that ol and o2 are the same object. 
For immutable objects, eq may in addition return true if ol and o2 are different objects that 
contain the same data. 

Eq may return unusual results for inline classes — an instance object of an inline class is not 
necessarily eq to itself, but eq will never return true on distinguishable objects. 

Neq is the logical negation of eq. 

(= ol, o2:object) boolean Method 

«> ol, o2:object) boolean Method 

These comparisons return true if ol is equal to or not equal to o2, respectively. Equality 
means numeric equality for numbers. It defaults to eq or neq for other objects, but the = 
method can be overridden to specify a different criterion for a particular class. 

(< ml, m2:magnitude) boolean Abstract Method 

(<= ml, m2:magnitude) boolean Abstract Method 

(> ml, m2:magnitude) boolean Abstract Method 

(>= ml, m2:magnitude) boolean Abstract Method 

These comparisons return true if ml is less than m2, ml is less than or equal to m2, ml is 
greater than m2, or ml is greater than or equal to m2, respectively. For the purposes of com- 
parison, false is considered to be less than true. It is an error to use <, <=, >, or >= to com- 
pare an object from one direct subclass of magnitude with one of another direct subclass of 
magnitude — a boolean cannot be compared with an integer. 

(max mi,m2:magnitude):magnilude Method 

(min mi,m2:magnitude) :magnitude Method 

Max returns the greater of ml and m2, while min returns the lesser one. Both max and min 
use one of the comparison operations above to decide which is the greater or lesser, and the 
same caveats as above apply. 

Logical Operations 

(not bboolean) boolean Method 

Not returns the logical negation of b. 

(and (bboolean)*) :boolean Method 

And returns the logical AND of its arguments. If no arguments are specified, and returns 

true. 

(or (bboolean)*) boolean Method 

Or returns the logical inclusive OR of its arguments. If no arguments are specified, or re- 
turns false. 
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(xor (b:boolean)*) :boolean Method 

Xor returns the logical exclusive OR of its arguments. If no arguments are specified, xor 
returns false. 

(sc-and (b:boolean)*) :boolean Macro 

(sc-or (b:boolean)*) :boolean Macro 

These are short-circuit versions of and and or. They evaluate arguments sequentially from 
left to right only as far as necessary for the answer to be unambiguously determined. 

Arithmetic Operations 

For most binary arithmetic operations, the class of the result is the class of the most general 
argument. For example, if two integers are added, the result is an integer, but if an in- 
teger and a float are added, the result is a float. User-defined classes may define other 
numeric subclasses, in which case they have to handle appropriate coercions themselves — if a 
number is added to a member of a user-defined subclass of number, the + method for the 
user-defined subclass will have to dispatch on the type of its second argument. 

(zero? n:number) :boolean Method 

Zero? returns true if n is zero and false otherwise. 

(neg n:number) :number Abstract Method 

Neg returns the negation of n. The class of the result value is the same as the class of n. 

(+ (nrnumber)*) :number Abstract Method 

+ returns the sum of its arguments. If no arguments are specified, + returns 0. 

(- nl, n2:number):number Abstract Method 

- returns the difference of its arguments, nl-n2. 

(* (n:number)*) :number Abstract Method 

* returns the product of its arguments. If no arguments are specified, * returns 1. 

(/ nl, n2:number):number Abstract Method 

/ returns the quotient of its arguments, nl/n2. If nl and n2 are both integers and nl is 
not exactly divisible by n2, the result is a float. If n2 is zero, either an error occurs or some 
representation of infinity is substituted as an answer. 

(// nl, n2:integer):integer Method 

// returns the integer quotient of its arguments rounded towards -°°, Lnl/n2_|. If n2 is zero, 
either an error occurs or some representation of infinity is substituted as an answer. Having 
// round towards -«> allows the use of ash to divide when the divisor is an integral power of 
two. 

(mod nl, n2:integer) :integer Method 

Mod returns the nonnegative remainder of dividing nl by n2, nl-n2*Lnl/n2_|. If n2 is zero, ei- 
ther an error occurs or some representation of an indeterminate number is substituted as an 
answer. Having mod return the nonnegative remainder allows the use of logand to find the 
remainder when the divisor is an integral power of two. When the remainder is nonzero, its 
sign is always the same as the sign of the divisor n2. Also, (+ (mod nl n2) (* n2 (// 
nl n2) ) ) = nl. 
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(ash nl:integer n2:integer):integer Method 

(ash nl:float n2:integer) float Method 

Ash returns nl multiplied by two raised to the n2th power, nl*2 n2 . If nl is a float, no 
rounding takes place; however, if nl is an integer and n2 is negative, the result is rounded 
towards -°°. 

(integer-length n:integer) :integer Method 

Integer-length returns the bit "size" of nl. For positive n this is riog 2 (n+l)l, while for 
negative n it is equal to flog2(-n)l. 

Bitwise Logical Operations 

(lognot b:boolean) boolean Method 

(logand (b:boolean)+) :boolean Method 

(logor (b:boolean)+) :boolean Method 

(logxor (b:boolean)+) :boolean Method 

(lognot b:integer) :integer Method 

(logand (b:integer)*) :integer Method 

(logor (b:integer)*) :integer Method 

(logxor (b:integer)*) :integer Method 

These methods perform bitwise logical operations. When called on booleans, they perform 
the same operations as not, and, or, and xor, respectively. When called on integers, they 
perform the corresponding operations bitwise on semi-infinite two's complement representa- 
tions of the integers, treating as false and 1 as true. The integers do not have to be in- 
ternally stored in the two's complement form; all that is necessary is that these operations 
act as if they were. When supplied with no arguments, logand returns -1, while logor and 
logxor return 0. 
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A.11. Locks 

Locks are used to synchronize processes. A lock can be acquired by only one process at a 
time, and the acquiring operation is atomic. After a process has acquired a lock, it can pro- 
ceed to perform whatever exclusive operations it wants to do. When it is done, it should re- 
lease the lock to make it available again. If a process attempts to acquire a lock that is busy 
(acquired), it will wait until the lock is available. 

Two built-in lock classes are provided: simple-lock and queueing-lock. Simple-lock 
is a very cheap and fast implementation intended for situations in which a lock is not ac- 
quired for long periods of time and there is little contention for the lock, simple-locks are 
adequate for most purposes. Queueing-locks are heavy-duty locks for use in situations 
where there may be significant contention for a lock. 

Lock Operations 

(new-simple-lock) :simple-lock Function 

(new-queueing-lock) :queueing-lock Function 

New-simple-lock creates a new simple lock, while new-queueing-lock creates a new 
queueing lock. The lock is initially available. 

(init l:simple-lock) :null Method 

(init l:queueing-Iock) mull Method 

init reinitializes the lock, making it available regardless of its previous state. 

(acquire l:lock) :null Abstract Method 

Acquire acquires the lock. If the lock is busy, acquire waits until the lock is available be- 
fore acquiring it and returning. 

(release l:IOCk) :null Abstract Method 

Release releases the lock. If the lock is already available, release signals an error. 

(busy? l:lock) boolean Abstract Method 

Busy? returns true if the lock is busy and false otherwise. 

(with-locks ((l:lock)*) body) Macro 

with-locks first acquires all of the locks listed, in the order in which they are listed, then 
evaluates body, and finally releases all of the locks. It returns the value of body. 
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A.12. Strings and Arrays 

Strings and arrays are the primitive data structures for keeping track of indexed collections 
of data. All primitive strings and arrays are subclasses of the class array. The subclasses of 
class array can be implemented as arrays, but implementations are encouraged to pack in- 
teger-arrays, strings, and boolean-arrays to conserve space and time. 

Creating Arrays 

(new-simple-array size:integer) :simple-array Method 

New-simple-array creates a new simple array of arbitrary objects. Size specifies the 
number of elements in the array; the elements are numbered through size-1. The array's 
elements are not initialized. 

(new-integer-array size:integer low, high:integer) :integer-array Method 

New-integer-array creates a new array of integers in the range between low and high, 
inclusive. Low must be less than or equal to high. Size specifies the number of elements in 
the array; the elements are numbered through size-1. The array's elements are not ini- 
tialized. 

(new-string size:integer) rstring Method 

New-string creates a new array of characters, also called a string. Size specifies the num- 
ber of elements in the array; the elements are numbered through size-1. The array's ele- 
ments are not initialized. 

(new-boolean-array size:integer) :boolean-array Method 

New-boolean-array creates a new array of booleans. Size specifies the number of ele- 
ments in the array; the elements are numbered through size-1. The array's elements are 
not initialized. 

Operations on Entire Arrays 

(fill a:array value) :array Abstract Method 

Fill destructively writes value to every element of the given array. If the array is an in- 
teger-array, a string, or a boolean-array, the value must have the correct type and, in 
the case of integer-array, it must be in the range specified when the array was created; 
otherwise, the results are unspecified. Fill returns the updated array. 

(init a:array f :funct) :array Abstract Method 

Init concurrently calls f on integers between and the size of a minus one, inclusive, and 
stores the results in the corresponding elements of a. If f or any other function tries to read 
an element of a, it will wait until the value is available. It is an error for f or any other func- 
tion to try to alter the values of elements of a before init returns. Init returns the a array 
after all calls to f have returned. 

(map src:array dst:array f :funct) :array Abstract Method 

Map concurrently calls f on each element of the src array and stores the results in the corre- 
sponding elements of the dst array. The sizes of the two arrays must be equal. If src is a 
simple-array, so must be dst. Src and dst may be the same array. If f or any other 
function tries to read an element of the dst array, it will wait until the value is available. It 
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is an error for f or any other function to try to alter the values of elements of the dst array 
before map returns. Map returns the dst array after all calls to f have returned. 

(for-each a:array f :funct) :array Abstract Method 

(nfor-each a:array f :funct) :array Abstract Method 

Both of the above methods concurrently call f on each element of the array and then return 
the array without modifying it. Nfor-each does not wait until any of the calls to f return, 
while for-each does. 

Accessing Arrays 

[a:array pos:integer] :object Abstract Method 

(get a:array pos:integer) :object 

Get returns the element at position pos of the given array. Get signals an error if pos is 
outside the bounds of the array. The results of accessing an uninitialized element are un- 
specified. 

(set [a:array pos:integer] value:object) :array Abstract Method 

(put a:array pos:integer vaiue:object) :array 

Put destructively writes value at position pos of the given array. Value is not touched. 
Put signals an error if pos is outside the bounds of the array. If the array is an integer- 
array, a string, or a boolean-array, the value must have the correct type and, in the 
case of integer-array, it must be in the range specified when the array was created; oth- 
erwise, the results are unspecified. Put returns the updated array. 

(size a:array) :integer Abstract Method 

Size returns the size of the array, as specified when the array was created. 
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A.13. Input and Output 
Streams 

Streams are sources and sinks of data. A stream is usually a connection to a terminal or to a 
file, but other uses of streams are possible. Concurrent Smalltalk defines a general class 
stream as well as a specific implementation of streams, system-stream. Other user-de- 
fined stream classes may be defined as subclasses of stream 

Operations on General Streams 

Reading 

(read-stream-char s:stream) :object Abstract Method 

Read-stream-char reads a character from stream s and returns it. If there is no more in- 
put available on the stream, read-stream-char returns nil. 

(read-stream-line s:stream) :object Abstract Method 

Read-stream-line atomically reads a line from stream s and returns it in the form of a 
string (without the trailing line terminator). If there is no more input available on the 
stream, read-stream-line returns nil. 

(read-stream s:stream) :object Abstract Method 

Read-stream reads some representation of a Concurrent Smalltalk object from stream s 
and returns it. If there is no more input available on the stream, read-stream returns the 
constant end-of -f ile. 

end-of-file:object Constant 

This unique constant is returned when read-stream-object encounters an end of file. 

(stream-char-ready? s:stream) :boolean Abstract Method 

Stream-char-ready? returns true if a character is ready to be read from stream s. It is 
not guaranteed to return false otherwise, so an implementation that always returns true 
is acceptable. 

Writing 

(write-stream-char s:stream ch:character) :null Abstract Method 

Write-stream-char writes character ch onto stream s. 

(write-stream-string s:stream string.String) :null Method 

Write-stream-string writes string string onto stream s. Write-stream-string is 
equivalent to calling write-stream-char on each character in string except that string 
is written atomically. 

(write-stream s:stream (o:object)*) :null Method 

Write-stream writes some representation of the given Concurrent Smalltalk objects onto 
stream s. It uses print to format objects it does not know about. Care should be taken 
when writing circular structures to make sure that write-stream terminates. 
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(display-stream s:stream (o:object)*) :null Method 

Display-stream writes some representation of the given Concurrent Smalltalk objects onto 
stream s. Strings and characters are written literally, without escape characters. Care 
should be taken when writing circular structures to make sure that display -stream termi- 
nates. 

Atomicity 

(split s:stream) :stream Abstract Method 

Split returns a new stream that can be used for atomic writing to s. Anything written to 
the returned stream is atomically written onto s when join is called on the returned stream. 

(join s:stream) mull Abstract Method 

Join joins s back to a stream from which it was split. It is an error to call join on a stream 
not returned by split or to call it more than once on such a stream. 

Input and Output Streams 

terminal-stream:system-stream Global 

Terminal-stream is the system-stream used for interaction with the terminal. 

(read-char) :object Function 

(read-line) :object Function 

(read) robject Function 

(char-ready?) :b00lean Function 

(write-char ch:character) mull Function 

(write-string string:string) :null Function 

(write (o:object)*) :null Function 

(display (o:object)*) :null Function 

(split-terminal) :stream Function 

These functions are the terminal equivalents of the general stream methods above. 

Formatting 

(print orobject s:stream) mull Abstract Method 

Print is used for formatting arbitrary objects for the purposes of write-stream. Print 
should output some readable representation of object o onto stream s. 

(display -print orobject s:stream) mull Method 

Display-print is used for formatting arbitrary objects for the purposes of display- 
stream. Display-print should output some readable representation of object o onto 
stream s, avoiding escape characters where possible. 
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A.14. Macros 

Concurrent Smalltalk provides a macro facility which can be used to extend the language. A 
macro consists of a pattern, an optional guard, and a replacement. The pattern can contain 
variables or literals (a literal is an identifier). If it matches with an expression and the guard 
is satisfied, that expression is replaced by the replacement, which can be either another pat- 
tern or a Common Lisp function. 

(defmacro pattern [guard] replacement) Top-level Macro 

pattern ::= literal | ?name | 'name | (pattern* [pattern . pattern]) | Spattern 

replacement ::= r-pattern | lisp-statement 

r-pattern ::= literal | ?name | 'name | (r-pattern* [r-pattern . r-pattern]) 

guard ::= (guard lisp-statement 

lisp-statement ::= #l lisp 

The macro pattern is a nested list of literals and macro variables. Variables are preceded by 
question marks (?) or exclamation points ( ! ). Question-mark variables can match identifiers, 
numbers, and lists, while exclamation-point variables can only match identifiers. The dotted 
notation at the end of a list indicates that the rest of the list should match the pattern after 
the dot. When a pattern is matched to a candidate statement, all instances of the same vari- 
able have to match identical forms. The pattern can be as simple as ?x, which will match 
any statement. 

If an @ symbol precedes a pattern, the form to which the pattern would match is macro-ex- 
panded before it is matched to the pattern. To avoid infinite loops, @ should not be the first 
symbol in a macro pattern. 

The guard, if present, is a Common Lisp statement that returns a boolean value. If the value 
returned is true, the macro replacement is substituted for the pattern; if not, the macro is 
not expanded. The values of the ? and ! variables are bound in a Common Lisp scope just 
outside the statement, so the Common Lisp statements can refer to the matched values of the 
variables just by referring to the correct variable names (including the leading ? or ! ). 

Replacement can be either another pattern or another block of Common Lisp statements. If 
replacement is a pattern, the values of the macro variables are substituted in it, and the re- 
placement pattern replaces the original pattern in the code. If replacement is a Common Lisp 
statement, it is expected to return a list which replaces the original pattern in the code. As 
in the case of a guard, the Common Lisp statement has access to the matched values of the 
macro variables. 

The macro replacement pattern can be another macro. Macros are expanded until the result- 
ing form does not satisfy any of the existing macro patterns and guards. When several 
macros match a form, the form is expanded using the macro that was most recently defined. 
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A.15. Environment 

Errors 

(error (msgiobject)*) Function 

Error signals a run-time error. The arguments, if any, should contain descriptive informa- 
tion about the error. The interpretation of the arguments' values is implementation-depen- 
dent. 

(halt) Function 

Halt halts execution of the current program due to a run-time error. Debugging information 
about the function or method in which the halt took place may be printed. 

Utilities 

(include "file-name") Top-level Primitive 

include reads the definitions in the file named file-name, as if that file were included in 
place of the include primitive. 

Options 

(pragma ...) Top-level Primitive 

Pragma is a general compiler declaration and can contain any implementation-dependent in- 
formation. 

(declare option value) Top-level Primitive 

Declare sets the compiler option named option to the value specified. Value must be a legal 
value for the option; most compiler options are booleans, and for these value must evaluate to 
either true or false. Value must be a constant expression. 

(option option) Primitive 

option returns the compile-time value of the specified compiler option. 
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This appendix describes the procedure for using the Optimist II compiler on a Macintosh II to 
compile Concurrent Smalltalk programs. In addition, a few helpful non-standard Concurrent 
Smalltalk features implemented by Optimist II are described. 

Starting the Compiler 

To start the compiler, load the image containing the compiler and the Common Lisp envi- 
ronment. If such an image is not available, load Common Lisp, PCL, the Loop macro, and 
the Optimist.Lisp file. Execute the (optimist :compile-optimist) command to compile 
and load the compiler, or, if it was already compiled, use (optimist : load-optimist) to 
load the compiler. 

The compiler provides only one useful external Lisp function. It is (interactive-cst) . 
Typing (interactive-cst) will enter an interactive Concurrent Smalltalk listener loop. 

Top-Level Commands 

Utility Commands 

(begin body) Top-Level Primitive 

Due to constraints in the compiler, a select few forms such as include and def class (but 
not all of the top-level primitives; most of the primitives listed as top level really only re- 
quire that they not be included in any function or method) must be present at the top level. 
However, sometimes it is desirable to emit sequences of those directives as results of macros; 
to allow this, a special form of begin was provided. If begin appears at the top level, every 
form inside it is also evaluated at the top level. 

(set name expression) Top-Level Macro 

Set normally sets the variable name to the value of expression. However, if it is placed at 
the top level, it is also allowed to create a new global variable name if one does not exist al- 
ready. Thus, at the top level, set acts as either set or def global, depending on whether 
the global variable name already exists. 

(include) Top-Level Primitive 

Include, when passed no file argument, will let the user interactively choose a text file and 
then include it. This feature is only available on the Macintosh version of Optimist II. 

Viewing Objects 

While the listener loop is active, any Concurrent Smalltalk command will be immediately 
evaluated, and the results displayed in the listener window. The resulting object may be 
displayed in a somewhat strange syntax; for example, integers may be displayed as 
#<Integer- 5>, and booleans as #<True> or #<False>. The following commands may be 
used to show the internal structure of objects: 

(show o:object) :object Top-Level Primitive 

Show shows as a side effect the Optimist II internal representation of an object. If the object 
is a function, its hcodes are shown; if the object is a complex object, some of its structure may 
be shown. The output is controlled by the CLOS show generic function. The value of the 
show directive is the object itself, so the object is usually printed normally after it is shown. 
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Please note that the hcodes shown for a function are only an approximation of the actual 
hcode data structure used internally to represent the function. Some of the more esoteric 
fields are not shown, and sometimes a function may have two variables with the same name, 
which leads to confusing output. Variable names were included for human readability only; 
Optimist II does not use them internally. It is able to keep the variables distinct regardless 
of their names. Also, since a digraph is a nonlinear structure, pseudo-hcodes such as jump 
labels in conditionals and jump, label, and break hcodes are inserted into the output to make 
it readable. 

(describe o:object) :object Top-Level Primitive 

Describe describes as a side effect the Optimist II internal representation of an object. It is 
just like show except that the information displayed is longer and more detailed. 

(show-hcode f function [#Llisp-f unction-name]) function Top-Level Primitive 

Show-hcode calls the Optimizer's non-MDP-specific optimizations to optimize the function 
and shows the resulting hcode. Show-hcode may invoke global optimizations and try to in- 
line the functions called by f, so this directive may take some time to execute. When the 
progress option is true (the default), progress information is displayed in the listener win- 
dow while this directive is executing. Detailed progress can be obtained by setting the de- 
tailed-progress option. Show-hcode performs no side effects on the Concurrent 
Smalltalk environment, and it does not do a treewalk of the Concurrent Smalltalk program. 
Show-hcode returns f as its result. 

If lisp-function-name is provided, instead of showing the optimized hcode, show-hcode calls 
the Lisp function lisp-function-name with the optimized hcode as an argument. Describe-din- 
odes is a useful Lisp function that will describe the compiled hcode in a little more detail. 

Show-hcode will not optimize a selector. If viewing optimized method code is desired, the 
method must be extracted explicitly using the Concurrent Smalltalk method primitive. 

(show-mdp-hcode f function [#Llisp-function-name]) function Top-Level Primitive 

Show-mdp-hcode is just like show-hcode except that it also performs the MDP-specific 
hcode optimizations. 

(show-asm f function [#Llisp-f unction-name]) function Top-Level Primitive 

Show-asm compiles the function f all the way to assembly code and prints the resulting 
MDPSim-compatible text. If lisp-function-name is supplied, it is assumed to be a lisp function 
and called with the assembly language module as its only argument. 

Compiling Programs 

(compile f :object ("OUtput-file-name"]) :object Top-Level Primitive 

Compile compiles and treewalks the Concurrent Smalltalk data structures starting with f as 
a root. Normally f is a function, in which case it is compiled to assembly language along with 
any other functions that it might need. If output-file-name is specified, the MDPSim file is 
written to a new file named output-file-name; otherwise, the output is sent to the listener. 
When the progress option is true (the default), progress information is displayed in the lis- 
tener window while this directive is executing. Detailed progress can be obtained by setting 
the detailed-progress option. 

Options 

As described in Section A.15, Concurrent Smalltalk options can be set using the declare 
Concurrent Smalltalk primitive and examined using the option primitive. The options cur- 
rently provided by Optimist II are listed in Table B-l. 
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Table B-l. Options 



Option 



Default Action 



n-nodes 



precise 



delete-dead-defs 



fold-constants 



fold-global- 
constants 



forward-tails 



false 



true 



delete-moves true 

delete-touches true 

vf low-optimizations true 



true 



true 



true 



merge-code 

inline 

inline-size-cutoff 


true 
true 
12 


optimize-built-ins 


true 


compact-vars 


true 


reg-variables 
Iru-register- 
allocation 
frame-touches 


true 
true 

true 


frame-regs 


true 


frame-migrate 


true 


lazy-ivar-access 


true 


lazy-contexts 
fast-contexts 


true 
true 



Define the number of nodes of a simulated J-Machine. This 
option only affects Optimist's internal interpreter; the com- 
piled code is generic and will work on a J-Machine of any 
size (as long as the dimensions are powers of two). 
Inhibit optimizations that would affect the semantics of fu- 
tures and lazy-futures in a few esoteric cases. If following 
precise Concurrent Smalltalk semantics is not important, 
disabling this option can produce significant performance 
improvements. 

Remove assignments to variables that will not be used 
again. 

Try to remove unnecessary move statements. 
Try to remove unnecessary touch statements. 
Calculate dataflow information and use it to perform a vari- 
ety of optimizations such as changing x<-y=0, branch if X 
false sequences to BNE instructions. 

Fold constants. For example, replace 1+2 by 3. Also remove 
conditional branches when it can be determined that the 
condition is always true or always false. 
Fold constants globally. For example, replace a call through 
a selector with a call of the method when the method can be 
determined using type analysis. This option is relevant only 
when fold-constants is true. 

Enable the altering of application hcodes immediately fol- 
lowed by returns into tail-forwarded applications which al- 
low the process to be deallocated and the answer directly 
forwarded to the caller. This is the equivalent of tail recur- 
sion. 

Merge common pieces of code wherever possible. 
Inline small functions. 

Set the size cutoff for automatically deciding whether to in- 
line a function. Increasing this number causes larger func- 
tions to be inlined. 

Perform local built-in optimizations such as changing mul- 
tiplications to shifts. 

Compact variables in the context to use as few slots as pos- 
sible. 

Assign variables to registers whenever possible. 
Use the least-recently-used algorithm to allocate temporary 
registers during code generation. 

Accumulate information about which variables are touched 
and optimize touches when the variables are known to be 
touched. 

Keep track of variables in the registers during code genera- 
tion and use values from the registers instead of from mem- 
ory whenever possible. 

Keep track of whether it is possible for the instance object to 
have migrated away. Don't force it if it could not have mi- 
grated away. 

Don't XLATE the instance object if there are no references 
to it. 

Don't allocate a context unless it is actually used. 
Use fast contexts whenever possible. 
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optimize-send-self 

fast-apply 

compact-sends 

compact-DCs 
delete-locals 



wam-free-refer- 
ences 
progress 
detailed-progress 
permanent- 
definitions 



print-pc 
lisp-break 



true 

true 

true 

true 
true 

false 

true 

false 

false 



true 
true 



Send message to the current node if the receiver is self and 

it is not atomic. 

Use ApplyFunction and ApplySelector instead of Apply 

whenever possible. 

Try to combine sends and SENDEs into SEND2s and 

SEND2ES. 

Try to align DCs on word boundaries whenever possible. 
Delete local variables in an intermediate stage of the compi- 
lation. This makes no difference in the final output, but 
makes the hcode look prettier and may speed up code gen- 
eration. 

Emit a warning every time a free reference is found in a 
method or function. 
Print progress reports. 
Print very long progress reports. 

Use defconstant instead of defparameter when compil- 
ing function and method definitions. When this option is set, 
a warning is emitted every time a free reference is found in 
a method or function regardless of the setting of warn-free- 
references. 

Print program counter values as comments in output. 
Enter a Lisp break loop upon a Concurrent Smalltalk warn- 
ing or error. 
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Loading Cosmos 

To use Cosmos, launch MDPSim using the Cosmos. m file as an argument. You may also 
wish to specify the J-Machine's dimensions as arguments to MDPSim. Use -x * -y y -z 
z, where x,y, and z are integers; they should be powers of two. To avoid using too much 
memory, you may wish to allocate less memory per MDP with the -msize mem option. 

When Cosmos. m is assembled by MDPSim, it will automatically load the operating system 
onto the MDPs and initialize the MDPs. This process may take anywhere from a few seconds 
to a few minutes depending on how many MDPs are present and the speed of the host com- 
puter. 

Loading User Programs 

Once Cosmos is ready, a user program compiled by Optimist II can be loaded. Use the 
MDPSim include command to load the program generated by Optimist II. Keep in mind 
that Cosmos puts MDPSim into the case-sensitive mode, so the case of identifiers and com- 
mands matters; MDPSim recognizes commands which are either all upper case or all lower 
case characters. 

MESSAGE fib4 

MSG:msgApply|5 

{fFib} 

4 

IONODE 



END 

Figure C-l. An Injected Application Message 

This message calls the fFib function with the argument 4. The message itself can be injected by executing the 
command inject f ib4. The 5 is the length of the message, ( fFib} is Optimist ll's output name for the func- 
tion to be called (see the Optimist II output file if you are unsure about the name), A is the argument, and ionode 
and are magic numbers that cause the Reply message to be printed by MDPSim. More than one one argument 
can be specified, as long as the length of the message (the 5) is increased appropriately. 

Once the user program has been loaded, it is a good idea to build a few templates for mes- 
sages to be injected into the program. An application message should have the format shown 
in Figure C-l. If the messages will be used for several sessions, it might be appropriate to 
put them into a file and include that file. Application messages should never be injected 
before the program is installed. 

Instead of issuing the INCLUDE commands manually, you can also specify the files on 
MDPSim's command line, as was done in the example in Figure 5-14. 

Running Programs 

To run a program, execute the INJECT command on the message on which the program 
should be called and then run the program. Remember to specify the processor onto which 
INJECT should inject the application message; otherwise, INJECT will inject a copy of the 
message to every processor, and as many copies of the program will execute simultaneously 
as there are processors in the simulated system. 
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MDPSim allows statistics to be gathered about programs which are executed on it. If the 
statistics should only include data about the running program, they should be reset after the 
program is downloaded and before it is run. See the current MDPSim manual [25] and 
Figure 5-14 of this document for more details. 

When you finish the desired program runs, use the QUIT command to exit the simulator and 
the quit menu item to exit MPW. In an emergency, command-shift-period can be used to 
abort MDPSim; command-period aborts the running MDP program and returns to MDPSim's 
command line (use control-C on UNIX machines). 
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This appendix is a summary of the current version of the MDP architecture. A slightly obso- 
lete full version of the architecture can be found in [16]. Many details have been simplified 
in order to keep this Appendix to a reasonable length. 

Introduction 

The Message-Driven Processor is a processing node for the J-Machine, a message-passing 
concurrent computer. The MDP is designed to provide support for fine-grained concurrent 
computation. Towards this goal the processor includes hardware for message queueing, low- 
latency message dispatching, and message sending. The same chip also contains a network 
interface and a router to allow the routing of messages throughout the network without any 
processor intervention. 

The size of the MDP's register set is limited to minimize context-switching time. Much of the 
memory is on the chip to improve performance and reduce the chip's pin count and the chip 
count for the concurrent computer. Having memory on chip allows more flexibility in the use 
of memory than in designs with off-chip memory. For example, a portion of memory may be 
designated as a two-way set-associative cache to be used by the xlate instruction. Never- 
theless, since current technological limitations restrict the size of the on-chip memory to 
about 4096 words, an external memory interface has been provided to allow access to slow, 
off-chip DRAM. 

The MDP is also designed to efficiently support object-oriented programming. Every MDP 
word consists of 32 data bits and a 4 bit tag that classifies the word as an integer, boolean, 
address, instruction, pointer, or other data. The MDP's four address registers include both 
base addresses and lengths, so all memory accesses are bounds checked. Normally the ad- 
dress registers point to objects, so, since absolute memory addressing is not allowed except by 
the operating system, memory references can only be made to objects relative to their begin- 
nings. Having tags and no absolute references permits the use of garbage collection and 
transparent migration of objects to other MDP nodes on the network. 

The MDP is almost completely message-driven. It is controlled by the messages arriving 
from the network that are automatically queued and processed. There are two priority levels 
to allow urgent messages to interrupt normal processing. There is also limited support for a 
background mode of execution when no messages are waiting in the queues. 

Processor State 

The processor state of the MDP is kept in a set of registers shown in Figure D-l. There are 
three independent copies of most registers for each of the two priorities of the MDP, allowing 
easy priority switches while keeping the integrity of the registers. The registers are symboli- 
cally represented as follows: 

R0-R3 general-purpose data registers 

A0-A3 address registers 

ido-id3 ID registers 

Q, m,u, f, i.p, b flags 

ip instruction pointer register 

fir faulted instruction register 

fip faulted instruction pointer register 

fopo, fopi faulted operand registers 

qbm queue base/limit register 

qhl queue head/tail register 
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TBM 
NNR 
MAR 



translation base/mask register 
node number register 
memory address register 



The Q flag controls message queue access through register A3, while the M flag guards against 
inter-priority message deadlocks. Setting the U (unchecked mode) flag disables type and 
overflow faults. Setting the F (faulted) flag vectors all faults to the CATASTROPHE vector; this 
flag is often set in critical sections of fault handlers. Setting the I (interrupt) flag prevents 
higher-priority interrupts. The B and P flags encode the current priority level. 
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Figure D-l. The MDP Register Set. 
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Data Types 

The data types that may be used in a word are shown in Figure D-2. All data types except 
fut and CFUT may be moved, compared with EQ and NEQ, XLATEd and ENTERed, RTAGged, 
WTAGged, CHECKed, and executed. Executing a non-lNST word causes it to be loaded into RO. 
Some data types allow additional operations, which are listed in detail in the description of 



types 
the instruction set. 



3 
5 



3 3 3 2 
2 10 9 



1 1 
7 6 



1 
9 






value (0=NIL) 


SYM 


1 


two's complement value 


INT 


10 


... Ob 


BOOL 


11 


r 
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base 


length 





ADDR 


10 


u 


f 


offset 


P 


a 





IP 


10 1 
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offset 


length 


MSG 


110 


user-defined 


CFUT 


111 


user-defined 


FUT 


10 


user-defined 


TAG8 


10 1 


user-defined 


TAG9 


10 10 


user-defined 


TAGA 


10 11 
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TAGB 


1 1 


first instruction 


second instruction 


INST0 


1 1 


1 first instruction 


second instruction 


INST1 


1 1 


1 first instruction 


second instruction 


INST2 


1 1 


1 1 first instruction 


second instruction 


INST3 



Figure D-2. The MDP Data Types. 
sym contains an atomic symbol, equal and nequal are allowed on symdoIs. If the data 
portion of a symbol contains all zeroes, the word takes on the value of nil. Cosmos re- 
names sym as tago and inserts a subtag in bits 28 through 31 to distinguish between a 
few more types. 

int contains a two's complement integer between -2 31 and 2 31 -1, inclusive. All arith- 
metic, logical, and comparison operations are allowed on ints. 
bool contains a boolean value, which is either true (b=1) or false (b=0). All logical and 
comparison operations are allowed on bools; false is considered to be less than true. 
addr contains a base/length pair that may be loaded into either one of the address regis- 
ters or qbm, qhl, or tbm. The uses of bits 30 and 31 vary among these registers. 
ip contains a value appropriate for loading into the ip. 

msg is the header of a message. It is similar to an ip. Due to a shortage of tags, Cos- 
mos also uses this tag under the name obj as an object header. 
cfut contains a context future. Almost all operations fault on context futures. They are 
not meant to be MOVEable. cfuts are used as placeholders for unavailable values to be 
computed in parallel by other processes; an attempt to read a cfut before its value is 
available will fault, and the operating system will suspend the current process until the 
value is available. 
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• fut is a standard future, futs may be moved, and their tags may be read and written, 
but they may not participate in any primitive operations such as addition or checking for 
equality. As with cfuts, an attempt to use a fut in a primitive operation will cause a 
fault, and the operating system will have to provide the appropriate value for the fut. 

• tags through tagb are tags for operating system-defined words. They cause faults on 
all primitive operations except eq, neq, bnil, and bnnil. Cosmos renames these tags 
as id, did, taga, and float, respectively. 

insto through inst3 are tags for instructions. The two instructions in a word occupy a 
total of 34 bits, so two tag bits are also used to encode them. 

Network Interface 

Incoming messages are queued in message queues before being dispatched and processed. 
There are two message queues, one for each priority level. When a message arrives, register 
A3 is set up to point to it in the message queue, and execution begins at the address specified 
by the message header. A message may be processed as soon as its first word arrives; the 
processor does not wait until the entire message is present before processing it. Memory ac- 
cesses to the message are checked to make sure that the processor does not try to access a 
word in the message before it arrives; if the processor tries to access a word too early, it waits 
until the word has arrived. 

The SUSPEND instruction informs the hardware that the processing of the current message is 
done and that it should fetch the next message. 

Message Transmission 

The SEND, SEND2, SENDE, and SEND2E instructions are used to send messages. The first 
word sent specifies the node number of the destination node (i.e. the destination node's nnr 
value) in the low 16 bits. The SEND instruction will use the current node's nnr and the desti- 
nation node number to find the relative offsets in the X and Y dimensions that the network 
controllers will use in routing the messages through the network. There are actually two fla- 
vors of each send instruction: SENDO, SEND20, SENDEO, and SEND2E0 send words of priority 
messages, while SEND1, SEND21, SENDE1, and SEND2E1 send words of priority 1 messages. 
The priority of the message is independent of the priority of the process that is sending it. 

The initial routing word is followed by a number of words which the network delivers verba- 
tim to the destination node. The network does not examine the contents of these words. The 
message is terminated by a SENDE or SEND2E instruction, which send the last one or two, re- 
spectively, words of it and inform the network to actually transmit the message. The first 
word that arrives at the destination node (the second word actually sent since the routing 
word is only used by the network and doesn't arrive at the destination node) must be tagged 
MSG. It contains the length of that message including that word but not including the routing 
word preceding it. It also contains the initial value of the IP at which execution is supposed 
to start. The destination node will fault MSG if this word is incorrect. 

The total time between the first SEND and the SENDE should be as short as possible to avoid 
blocking the network. For the same reason, faults should be avoided while sending. 

Fault Processing 

When a fault occurs, the instruction that caused the fault is saved in the FIR register, the 
current IP (which points one instruction beyond the faulting instruction) is saved in the FIP 
register, and the values of the instruction operands, if any, are saved in the FOPO and FOP1 
registers. If the fault occurred while fetching an instruction, the FIR is set to NIL and the 
FIP points to the instruction. The IP is then fetched from the memory location whose ad- 
dress is equal to the fault number plus the base of the fault vector table of the current prior- 
ity. If the F bit was, the IP is loaded from the catastrophe vector instead. The U, A, and F 
flags receive their new values from the loaded IP. The faults are listed in Table D-l. 
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Table D-l. MDP Faults 



Name 



Fault 
Number 



Description 



CATASTROPHE 

INTERRUPT 

QUEUE 

SEND 

ILGINST 

DRAMERR 

INVADR 

LIMIT 

ADRTYPE 

EARLY 

MSG 

XLATE 

OVERFLOW 

CFUT 

FUT 

TAG 8 

TAG 9 

TAGA 

TAGB 

TYPE 



$00 
$01 
$02 
$03 
$04 
$05 
$06 

$07 

$08 

$09 

$0A 

$0B 

$0C 

$0D 

$0E 

$0F 

$10 

$11 

$12 

$13 

$14-$1F 



Double fault.bad vector, or other catastrophe. 

Interrupt pin has gone active. 

Message queue about to overflow. 

Send buffer full. 

Illegal instruction. 

Double bit error in the external RAM. 

Attempt to access data through address register with i bit 

set. 

Attempt to access object data past limit. 

Index in indexed addressing mode not tagged int. 

Attempt to access data in message queue before it arrived. 

Bad message header. 

XLATE missed. 

Integer arithmetic overflow. 

Attempted operation on a word tagged cfut. 

Attempted operation on a word tagged fut. 

Attempted operation on a word tagged tags. 

Attempted operation on a word tagged tag9. 

Attempted operation on a word tagged taga. 

Attempted operation on a word tagged tagb. 

An operand or a combination of operands with a bad tag 

type used in an instruction. 

Reserved for future faults. 



If multiple faults occur simultaneously the fault vector chosen is the one that has the highest precedence. Each 
fault is assigned a precedence by its fault number; lower fault numbers correspond to higher precedence. 



Instruction Encoding 

The program executed by the MDP consists of instructions and constants. A constant is any 
word not tagged INST0 through INST3 that is encountered in the instruction stream. When 
a constant word is encountered, that word is loaded into R0 and execution proceeds with the 
next word (the assembler syntax for including a word in the code stream is DC). 

Every instruction is 17 bits long. Two 17-bit instructions are packed into a word. Since a 
word has only 32 data bits, two tag bits are also used to specify the instructions. The in- 
struction in the high part of the word is executed first, followed by the instruction in the low 
part of the word. As a matter of convention, if only one instruction is present in a word, it 
should be placed in the high part, and the low part of the word set to all zeros. 

The format of an instruction is as follows: 



16 



11 10 9 8 



Opcode 


2nd 

reg 

# 


1st 

reg 

# 


Addressing mode 



The opcode field specifies one of 64 possible instructions. The other fields specify three 
operands; instructions that don't require three operands ignore some of the operand fields. 
Operands 1 and 2 must be data registers; their numbers (0 through 3) are encoded in the 1st 
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reg # and 2nd reg # fields. Operand 2, if used, is always the destination of an operation and 
operand 1, if used, is always a source. 



Normal 

Addressing Mode 

iiii i 











Rn 








1 


An 








1 














1 





1 








1 





1 








1 





1 1 








1 


1 











1 


1 


1 








1 


1 


1 








1 


1 


1 1 








1 


Rx 


An 





1 


imm 


1 


imm 


An 



Syntax Addressing Mode 

Rn Data register Rn 

An Address register An 

nil Immediate constant NIL (SYM:0) 

false Immediate constant FALSE (BOOL:0) 

true Immediate constant TRUE (BOOL1 ) 

$80000000 Immediate constant INT:$80000000 

$ff Immediate constant INT:$000000FF 

$3ff Immediate constant INT:$000003FF 

$ffff Immediate constant INT:$0000FFFF 

$fffff Immediate constant INT:$000FFFFF 

[ Rx , An ] Offset Rx in object An 

imm Immediate imm (signed) 

[ imm, An ] Offset imm (unsigned) in object An 

Figure D-3. The MDP Normal Addressing Modes. 

The immediate constants are eight immediate values outside the range int : -1 6..INT : 15. They are provided for 
convenience and code density improvement. The $FF and $FFFF constants are useful for masking bytes and 
words, while the $3FF and $FFFFF constants may be used for masking lengths and addresses. 

Operand can be used as a source or a destination in an instruction. It can hold two possible 
encodings. A normal instruction has opO address mode encodings as shown in Figure D-3. 
The register-oriented opO mode is used only by three variants of the MOVE instruction. If an 
instruction uses the register-oriented opO, the encodings are as in Figure D-4. 

Instruction Set Summary 

The instructions supported by the MDP are summarized in Table D-2. The Types column 
specifies the types on which the instruction operates; if the arguments have different types, 
the instruction faults. Except for a move to memory, all instructions fault when any of their 
operands are tagged cfut. Also, except for moves and sends, all instructions fault when any 
of their operands are tagged fut. 
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1 
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Syntax Addressing Mode 

Rn Data register Rn 

An Address register An 

IDn id register IDn 

fip Trapped Instruction pointer 

fir Trapped Instruction register 

fopTj Trapped OPO register 

fopi Trapped OP1 register 

QBM Queue Base/Mask register 

Qhl Queue Head/Length register 

ip Instruction Pointer register 

tbm Translation Base/Mask register 

nnr Node Number register 

MAR Memory Address Bus register 
Unused (ILGINST fault) 
Unused (ILGINST fault) 

p Priority Level flag 

b Background Execution flag 

i Interrupt flag 

f Fault flag 

u Unchecked flag 

Q A3 Queue flag 

Unused (ILGINST fault) 
Unused (ILGINST fault) 

Figure D-4. The MDP Register Oriented Addressing Modes. 

B and P represent the priority of the register being accessed XORed with the current priority. For example, 00 
indicates the current priority, while 01 would let priority 1 access priority 0's registers, and 1 1 would let priority 1 
access the background registers. The assembler syntax for specifying a register belonging to the other priority is 
the register name followed by a b to flip the B bit and/or a backquote ( ' ) to flip the P bit. 



Table D-2. MDP Instructions 



Instruction 



Brief Description 



General Movement and Type Instructions 

move Src,Rd Rd<- s re. Src may be a register addressing mode. 

move Rs, Dst Dst <- Rs. Dst may be a register addressing mode. 

move Src, ip ip<-src. Src may be a register addressing mode. 



Types 

All 
All 
All 
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RTAG 
WTAG 
CHECK 



Src,Rd 

Rs,Src,Rd 

Rs,Src,Rd 



Arithmetic and Logical 



NEG 

ADD 

SUB 

CARRY 

MUL 

MULH 

ASH 

LSH 

ROT 

FFB 

NOT 

AND 

OR 

XOR 

LT 

LE 

GT 

GE 

EQUAL 

NEQUAL 

EQ 

NEQ 



Src,Rd 
Rs,Src,Rd 
Rs, Src,Rd 
Rs, Src,Rd 
Rs, Src,Rd 
Rs , S re , Rd 
Rs , S re , Rd 
Rs, Src,Rd 
Rs, Src,Rd 
Src,Rd 
Src,Rd 
Rs, Src, Rd 
Rs,Src,Rd 
Rs , S re , Rd 
Rs, Src,Rd 
Rs, Src, Rd 
Rs , S re , Rd 
Rs , S re , Rd 
Rs, Src,Rd 
Rs,Src,Rd 
Rs, Src,Rd 
Rs, Src,Rd 



Network Instructions 

SEND Src 

SENDE Src 

SEND2 Rs,Src 

SEND2E Rs,Src 



Rd<-iNT:tag(Src) 

Rd<- SrcIRs 

Rd <r- BOOL:tag(Rs)=Src 

Instructions 

Rd< — Src 
Rd<— Rs+Src 
Rd<-Rs-Src 

Rd <- Carry from the addition of Rs and Src 

Rd*-Rs*Src 

Rd <- High 32 bits of 64-bit unsigned product of Rs and Src 

Rd <- Rs shifted left arithmetically by Src bits 

Rd <- Rs shifted left logically by s re bits 

Rd <- Rs rotated left by src (mod 32) bits 

Rd <- 31 -position of leftmost bit of Rs differing from bit 31 . 

Rd<-NOTSrc 

Rd<-Rs AND Src 

Rd<-RsORSrc 

Rd<-RsXOR Src 

Rd <— BOOL:Rs<Src 

Rd<-BOOL:Rs<Src 

Rd <— bool:rs>s re 

Rd<-BOOL:Rs>Src 

Rd<-BOOL:Rs=Src 

Rd<-BOOL:Rs*Src 

Rd^ BOOL:Rs=Src (Pointer comparison only) 
Rd <- BOOLiRs^Src (Pointer comparison only) 

Send Src onto the network 

Send Src onto the network and terminate message 

Send Rs and Src onto the network 

Send rs and Src onto the network and terminate message 



Associative Lookup Table Instructions 

xlate rs , Dst , c Dst <- associative lookup in the associative lookup table of Rs 
enter src,Rs Enter (src, Dst) into the associative lookup table' 
probe Src,Rd Rd<- bool: Src is in the associative lookup table 



No operation 

Invalidate all relocatable address registers 
Terminate current process and fetch another message 
Call system routine numbered Src 

Branch forward Src words 
Branch forward Src words if Rs is nil 
Branch forward Src words if Rs is not nil 
Branch forward Src words if Rs is false 
Branch forward Src words if Rs is true 
Branch forward Src words if Rs is zero 
Branch forward Src words if Rs is non-zero 



All 
All 
All 

INT 

INT 

INT 

INT 

INT 

INT 

INT 

INT 

INT 

INT 

INT, 

INT, 

INT, 

INT, 

INT, 

INT, 

INT, 

INT, 

SYM, 

SYM, 

All 

All 

All 
All 
All 
All 

All 
All 
All 



BOOL 

BOOL 

BOOL 

BOOL 

BOOL 

BOOL 

BOOL 

BOOL 

INT, BOOL 

INT, BOOL 



Special 


Instructions 


NOP 




INVAL 




SUSPEND 


CALL 


Src 


Branches 


BR 


Src 


BNIL 


Rs,Src 


BNNIL 


Rs, Src 


BF 


Rs, Src 


BT 


Rs, Src 


BZ 


Rs, Src 


BNZ 


Rs, Src 



All 
All 

BOOL 
BOOL 
INT 
INT 
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Cosmos.i 



MDP Operating System 
version 2.3 



written by 
Naldemar Horwat 



Master* s thesis under Prof. William Dally 

March 28, 1989 
May 1991 

Send problems and comments to 
waldemarghx . lcs . mit . edu . 

Copyright 1989, 1990, 1991 Haldemar Horwat 



; I Parameters 



.•These parameters are used to customize Cosmos. You can override the default settings of 
••these parameters by using -d REALMODE-1, etc. command line options. 

;REALMODE is true if the code should be compiled for a real J-Machine instead of MDPSim. 
.■This turns off the STOP instruction (this means you can't use RUN) . 
IF JDefined(REALMODE) 
LABEL REALMODE - 
END 

,-FASTSIM is true if the loop that clears memory to CFUTs should be skipped. 
IF ! Defined (FASTSIM) 

LABEL FASTSIM - 1 
END 

.•DEBUG is true if extra debugging code should be run. 
IF !Defined<DEBUG) 

LABEL DEBUG - 1 
END 



Eguates 



LABEL LogNNodes - LGNNODES 
LABEL NNodes - K<LogNNodes 



LABEL nFastContexts - 8 



; Number of fast context to allocate. 



I Memory Map 



LABEL GlobalsOstart - 

LABEL GlobalsOEnd - $40 

LABEL Globalslstart - S40 

LABEL GlobalslEnd - $80 

LABEL ADR FaultsOStart - $80 

LABEL ADR Fault sOEnd - $A0 

LABEL ADR FaUltSlStart - SAO 

LABEL ADR FaultslEnd - SCO 

LABEL ADR CallsStart - SCO 

LABEL ADR CallsEnd - $100 

LABEL ADR QueuelStart - $100 

LABEL ADR QueuelEnd - $100 

LABEL ADR QueueOStart - $100 

LABEL ADR QueueOEnd - $200 

LABEL ADR XlateStart - $200 

LABEL ADR XlateEnd - $400 

LABEL BRATLenLog - 6 

LABEL BRATLength - l«BRATLenLog 

LABEL ADR BRATStart - $400 

LABEL ADR BRATEnd - BRATStart tBRATLength 

LABEL ADR HeapEnd - MEMSIZE 



I Tags 



LABEL TAG TAG0 - 
LABEL TAG OBJ - MSG 
LABEL TAG CS - INST1 



; Immediate object tag. 

.■Objects and messages have the same tag. 

.-Class/selector . 
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;Subtags of TAGO : 

LABEL subtagN - 28 

LABEL subtagL - 4 

LABEL subtagM - <l«subtagL) -1 

LABEL subSYM - 

LABEL subCLASS - 1 

LABEL subSEL - 2 

LABEL subCHAR - 3 



:Subtag offset. 
,-Subtag length. 
;Subtag mask. 
; Symbol . 
;Class. 
.•Selector. 
; Character. 



I Types 



.•Address bits 

LABEL lengthN - 

LABEL lengthL - 10 

LABEL lengthH - <K<lengthL)-l 

LABEL baseN - 10 

LABEL baseL - 20 

LABEL baseH - (l«baseL)-l 

LABEL InvalidN - 30 

LABEL Invalid - K<invalidN 

LABEL relN - 31 

LABEL rel - l«relN 

LABEL dlsableN - 30 

LABEL disable - K<dlsableN 

;IP bits 

LABEL absN - 8 

LABEL abs - l«absK 

LABEL phaseN - 9 

LABEL phase - l«phaseN 

LABEL offsetN - 10 

LABEL offsetL - 20 

LABEL faUltN - 30 

LABEL fault - K<faultN 

LABEL uncheckedN - 31 

LABEL unchecked - K<uncheckedN 

;ID bits 

LABEL homeNodeN - 

LABEL homeNodeL - 16 

LABEL homeNodeM - (K<homeNodeL) -1 

LABEL serialN - 16 

LABEL serlalL - 15 

LABEL serialM - (KoerialL) -1 

LABEL distobjMemberN - 31 

LABEL distobjMember - l«distobjMemberN 

.-DID bits 

LABEL initialNodeN - 

LABEL initialNodeL - 11 

LABEL initialNodeM - (l«initialNodeL) -1 

LABEL logStrideN - 11 

LABEL logStrideL - 5 

LABEL logStrideM - (l«logStrideL) -1 

.-Class/Selector bits 

LABEL csSelectorN - 

LABEL csSelectorL - 16 

LABEL csSelectorM - (K<csSelectorL) -1 

LABEL csClassN - 16 

LABEL csClassL - 16 

LABEL csClassM - (K<csClassL) -1 



.•Length field offset. 
.•Length field length. 
; Length field mask. 
.•Base field offset. 
.•Base field length. 
.-Base field mask. 
/Invalid address. 

.-Relocatable address. 

.•Disable bit of QBM regs. 



.-Absolute IP. 

; IP phase bit . 

.•Offset field offset. 
.-Offset field length. 
.•Fault flag. 

.'Unchecked mode flag. 



: Home node . 



rSerial number. 



.-Distributed object member flag. 



Initial node. 



2's complement lg I#nodes/#constituents) 



X field offset. 

X field length. 

X field mask. 

X complement field mask. 

Y field offset. 

Y field length. 

Y field mask. 

Y complement field mask. 
Z field offset. 

Z field length. 

Z field mask. 

Z complement field mask. 



;xyz bits 

LABEL XN - 

LABEL XL - LGXNODES 

LABEL XM - (l«xL)-l 

LABEL XMC - (1«5-XL)-1 

LABEL yN - 5 

LABEL yL - LGYNODES 

LABEL yM - (l«yL)-l 

LABEL yMC - (l«5-yL)-l 

LABEL ZN - 10 

LABEL ZL - LGZNODES 

LABEL ZM - (l«zL)-l 

LABEL ZMC - (1«6-ZL)-1 

.-These constants are used to fashion serial and node numbers for precompiled objects. 

LABEL raX - XM 

LABEL SX - 

LABEL mY - yM«xL 

LABEL SY - yN-xL 

LABEL mZ - zM«xL+yL 

LABEL sZ - ZN-xL-yL 

LABEL mS - serialM«xL+yL+zL 

LABEL sS - serialN-xL-yL-zL 

;The nth object is stored at (nsmX) <<sXI (nimYX<sY I (nlmZ) <<sZ I (nsmSI <<sS 

.•These constants are used to fashion numbers for precompiled classes and selectors 

; so as to distribute them evenly throughout the J-Machine. 

LABEL m3 - xMC«xL+yL+zL 

LABEL S3 - -yL-ZL 

LABEL mi - yMC<<xN+yL+zL 

LABEL s4 ZL 

LABEL m5 - zMC«xN + yN+zL 

LABEL S5 - 

;The nth object is stored at (nSmX) <<sXI (nlmY) <<sY I (nimZ) <<sZ I (nsm3) <<s3 I (n«m4) <<s4 I (nim5) <<s5 

LABEL nodeMask - zM«zNlyM«yN I xM<<xN ;Mask for generating random node numbers. 

LABEL RandomSeedlncrement - 5<<zN-2 I 3<<yN-l I K<xN 
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.•Hardwired classes: 

LABEL classPrimitiveClass - <24mX) «sX| <24mY> «sY| <24mZ) «sZ | (2fim3> «s3 I (24m4) «s4 I <24m5) «s5 
LABEL classStandardClass - <34mX) «sX|(34mY) «sY I <34mZ)«sZ I (3im3» «s3 I (34m4> «s4 I <34m5) «s5 
LABEL classDistributedClass - (44mX> «sX| <44mY) «sYt(44mZ) «sZ I (44m3) «s3 I <44m4) «s4 | <44m5> «s5 
LABEL classObject - <54mX) «sX I < 54mY) «sY | (SinZ) «sZ i (54m3> <<s3 I (54m4 ) «s4 I (54m5) «s5 
LABEL classNull - (64mX)«sX| (64mY)«sY| <64mZ)«sZ| (64m3)«s3| {64m4)«s4| (64m5)«sS 
LABEL classSymbol - <74mX) «sX| {74mY> «sY I (74mZ)«sZ t (74m3) «s3 I (74m4) «s4 | <74m5» «s5 
LABEL classClass - <84mX)«sX| <84mY)«sYl (84mZ)«sZ| (8im3>«s3| (84m4)«s4| <84m5»«sS 
LABEL classSelector - <94mX) «sX| (94mY) «sY| (94mZ> «sZ I (94m3) «s3 I <94m4) «s4 I (94m5) «sS 
LABEL classCharacter - <104mX) «sX| (104mY) «sY I (104mZ) «sZ I (10fim3> «s3 I <104m4> «s4 i (104m5) «s5 
LABEL classlnteger - (114mX>«sX| (H4mY)«sY| (ll4mZ>«sZ| ( 1 I4m3) «s3 | ( Il4m4 ) «s4 f (H4m5)«sS 
LABEL classBoolean - <124mX) «sX | (12imY> «sY I (124mZ) «sZ I ( 124m3) «s3 I ( 124m4 ) «s4 I (124m5) «s5 
LABEL classFalse - <134mX) «sX I <134mY) «sYM134mZ) «sZ I (134m3> «s3 I <134m4) «s4 t (134m5> «s5 
LABEL classTrue - <144mX» «sXJ (144mY) «sY I { 144mZ) «sZ | ( 144m3) «s3 I ( 144m4 ) «s4 t (144m5) «s5 
LABEL classFloat - <154mX)«sX| (154mY)«sY| (154mZ)«sZ| (15tm3)«s3| (154m4>«s4| (154m5>«sS 
LABEL classFunction - (164mX> «sX I (164mY) «sY I (16SmZ>«sZ I (164m3) «s3 I <164m4)«s4 I (164m5) «9S 



Objects 



LABEL objectHeader - 
LABEL objectID - 1 

;Object header bits: 

LABEL hdrLenqthN - 

LABEL hdrLengthL - 10 

LABEL hdrLengthM - ( l«hdrLengthL) -1 ; 

LABEL hdrClassN - 10 

LABEL hdrClassL - 16 

LABEL hdrClassM - (l«hdrClassL> -1; 

LABEL hdrFastN - 26 

LABEL hdrFast - l«hdrFastN 

LABEL hdrDeletedN - 27 

LABEL hdrDeleted - K<hdrDeletedN 

LABEL hdrCopyableN - 28 

LABEL hdrCopyable - K<hdrCopyableN 

LABEL hdrPurgeableN - 29 

LABEL hdrPurgeable - K<hdrPurgeableN 

LABEL hdrLockedN - 30 

LABEL hdrLocked - K<hdrLockedN 

LABEL hdrMarkedN - 31 

LABEL hdrMarked - l«hdrMarkedN 



.•Length field offset. 
.•Length field length. 
; Length field mask. 
.-Class field offset. 
.-Class field length. 
.-Class field mask. 
,-Fast context. 

;Free object. 

.-Immutable copyable object. 

,-Purgeable object. 

.-Locked object. 

;Purgeable object marked by sweeper. 



.•Class objects: 
LABEL oClassWord - 2 
LABEL oClassNAHSUpers - 3 
LABEL oClassAllSupers - 4 



.•Header word for objects of this class. 
.•Count of all superclasses for this class. 
;List of all superclasses for this class. 



.•Selectors: 

LABEL oSelNHethods - 2 

LABEL oSelMethods - 3 



.-Number of methods defined for this selector. 
.•List of class/method pairs for this selector. 



.'Functions : 

LABEL oFunctionNArgs - 2 

LABEL oFunctionCode - 3 



.■Number of arguments or NIL. 
.•Code of function. 



; Closures: 

LABEL oClosureNArgs - 2 
LABEL oClosureCode - 3 
LABEL oClosureFunct - 4 
LABEL oClosureDisplay - 5 

,-Distobjs : 

LABEL oDistobjGroup - 2 
LABEL oDistobjIndex - 3 
LABEL oDistobjLogicalLimit - 



.•Number of arguments or NIL. 
.-Faulting instruction. 
.•Function to be called. 
/Additional display arguments. 



;DID of a distributed object. 

.•Constituent number of a constituent. 

.•Logical number of constituents in a distributed object. 



I Contexts 



LABEL contextHeader - 

LABEL contextID - 1 

.•Context message and locals are in locations 2 through 15 

LABEL contextRO - 16 

LABEL contextRl - 17 

LABEL contextR2 - 18 

LABEL contextR3 - 19 

LABEL contextlDO - 20 

LABEL contextID2 - 21 

LABEL contextID3 - 22 

LABEL contextIP - 23 

LABEL contextNext - 24 



LABEL contextSize - 25 

.-More locals may follow here. 



;Next context in a chain. 

.-Also used to store NIL or next context number when waiting 

; f or an object, or zero when waiting for a cfuture. 

,-size of a fast context. 



; | Messages 



;+- 



.•Apply message; 
LABEL applyHeader - 
LABEL applyFunct ■= 1 
LABEL applyReceiver - 

; Reply message: 
LABEL replyHeader - 
LABEL replylD - 1 
LABEL replySlot - 2 
LABEL replyValue - 3 
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;RestartContext message: 
LABEL restartHeader - 
LABEL restartID - 1 

,-NewObject message: 
LABEL newObjHeader - 
LABEL newobjclass - 1 
LABEL newObjReplylD - 2 
LABEL newObjReplySlot - 3 

.-Dispose message: 
LABEL disposeHeader - 
LABEL disposelD - 1 

,-DisposeBRAT message: 

LABEL disposeBRATHeader - 
LABEL disposeBRATID - 1 

; LookupMethod message: 

LABEL LLookupMethod - ID: ll«30l <0tmX)«sXI (04mY)«sYI <0smZ)«sZI (Osms)«sS> 

REP REV LoolcupMethod • LLookupHethod 

LABEL lookHethSelector - 2 

LABEL lookMethClass - 3 

LABEL lookMethReplylD - 4 

;MethodReply message: 
LABEL methodReplyHeader - 
LABEL methodReplylD - 1 
LABEL methodReply Value - 2 

;RequestObject message: 
LABEL reqObjHeader - 
LABEL reqObjID - 1 
LABEL reqObjReplyNode - 2 

;UpdateHome message: 
LABEL updtHomeHeader - 
LABEL updtHomelD - 1 
LABEL updtHomeNode - 2 

.•Unlock message: 
LABEL unlockHeader - 
LABEL unlockID - 1 



LABEL msgAcknowledgeObject - K<offsetN 



LABEL ADR TempDiv_Count " 4 
LABEL ADR LimitOverride - 5 
LABEL ADR FastContextQueue - 6 
LABEL ADR TempCHRO - 7 

LABEL ADR FirstFree - 8 
LABEL ADR LastFree - 9 
LABEL ADR BRATFree - 10 
LABEL ADR LastObjectID - 11 
LABEL ADR NextDistobj ID - 12 
LABEL ADR SerialNode - 13 

LABEL ADR NodeMask - 15 
LABEL ADR HeapStart - 16 
LABEL ADR RandomSeed - 17 

LABEL ADR TempXLATE_R0 - 18 

LABEL ADR TempXLATERl - 19 

LABEL ADR TempXLATE_R2 - 20 

LABEL ADR TempXLATE_FIP - 21 

LABEL ADR TempXLATE_FIR - 22 

LABEL ADR TempXLATE_Temp - 23 

LABEL ADR TempDO_FIP - 24 

LABEL ADR TempNCFIP - 25 

LABEL ADR TempNC_ID2 - 26 

LABEL ADR TempNC R2 - 27 

LABEL ADR TempNC~R3 - 28 

LABEL ADR TempCH_FIP - 29 

LABEL ADR TempCH_R2 - 30 

LABEL ADR TempCH_R3 - 31 

LABEL ADR TempCH_A3 - 32 

LABEL ADR TempCH_ID3 - 33 

LABEL ADR TempCHLock - 34 

LABEL ADR TempCH_Src - 35 

LABEL ADR TempANO_FIP - 36 

LABEL ADR TempEB_Key - 37 

LABEL ADR TempLM_FIP - 38 

LABEL ADR TempINITM_Context - 39 

LABEL ADR TempTOf_FIP - 40 

LABEL ADR TempDiv_R2 - TempTOfFIP 

LABEL ADR TempDiv_R3 - 41 

LABEL ADR TempDiv_80000000 - 42 

LABEL ADR TempNClFIP - TempDiv_R2 

LABEL ADR TempDeallocFIP - 43 



.-Divide temporary. 

.-NIL or IP to which a limit fault should jump (one time only). 

.•Queue of fast contexts. 

.•CompactHeap temporary. 

.-Pointer to first free heap word. 

.•Pointer to last free heap word plus one. 

.•Pointer to free BRAT links. 

;ID of last object to be allocated. 

;ID of next distributed object to be allocated. 

.-This node's serial number. 

;The nodeMask constant. 

.•Pointer to the beginning of the relocatable heap. 

.•Random number seed. 

;XLATE fault handler temporaries. 



; DisposeObject temporary. 
;NewContext temporaries. 



; CompactHeap temporaries. 



.-AllocNewObject temporary. 
.'EnterBinding temporary. 
; LookupMethod temporaries. 
; InitializeMDP temporary. 
,-ClassOf temporary. 
.'Divide temporaries. 



;Newclosure temporary. 

; DeallocateObject temporary. 
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Cosmos Listing 



Cosmos.i 



; 1 Fault Numbers 



LABEL VECTOR suspend - $00 
LABEL VECTOR blockMove - $01 
LABEL VECTOR blockSend - $02 
LABEL VECTOR compactHeap - $03 
LABEL VECTOR allocObject - $04 
LABEL VECTOR entsrBindlng - $05 
LABEL VECTOR lookupBinding - $06 
LABEL VECTOR deleteBinding - $07 
LABEL VECTOR purgeBinding - $08 
LABEL VECTOR newLocalObject - $09 
LABEL VECTOR allocNextObject - $0A 
LABEL VECTOR allocNewObject - $0B 
LABEL VECTOR newContext - $0C 
LABEL VECTOR disposeContext - $0D 
LABEL VECTOR di3pose0b ject - $0E 
LABEL VECTOR deallocateob ject - $0F 
LABEL VECTOR newObject - $10 
LABEL VECTOR classOf - $11 
LABEL VECTOR typeOf - $12 
LABEL VECTOR objectNode - $13 
LABEL VECTOR preferredConstituent - 
LABEL VECTOR CO - $15 
LABEL VECTOR lookupMethod - $16 
LABEL VECTOR lookupMethodU - $17 
LABEL VECTOR divide - $18 
LABEL VECTOR newClosure - $19 
LABEL VECTOR callclosure - $1A 



I XLATE Fault Codes 



LABEL objectXLATE - 
LABEL InternalXLATE - 1 

LABEL localXLATE - 2 
LABEL restoreXLATE - 3 



Find and bring the object here. 

Same as localXLATE but also works for classes and selectors. 

Return the object address, its node number, or NIL if it is a constant. 

Restore an address register from a saved ID value. 



I Halt Codes 



LABEL haltFaultO - 
LABEL haltFaultl - 1 
LABEL haltFuture - 2 
LABEL haltOverflow - 3 
LABEL haltType - 4 
LABEL haltUser - 5 
LABEL haltRange - 6 
LABEL haltCall - 7 
LABEL haltlnvalidAl - 8 
LABEL haltReply - 9 
LABEL haltuninitvar - 13 
LABEL haltTypeOf - 14 
LABEL haltXLATE - 15 
LABEL haltBRATType - 18 
LABEL haltBRATMissing - 17 
LABEL haltBRATDelete - 18 
LABEL haltClassType - 19 
LABEL haltlnternalType - 20 

LABEL haltBRATFull - 21 
LABEL haltMemFull - 22 
LABEL haltApply - 23 
LABEL haltHeap - 24 
LABEL haltLimit - 25 
LABEL haltDivO - 26 



.-General priority fault. 

.-General priority 1 fault. 

.■Futures are not implemented yet. 

.-Bignums are not implemented yet. 

.-Overriding built-in selectors is not implemented yet. 

.-Halt by user program. 

.'Range exceeded in a primitive operation. 

: Undefined system call. 

;A1 invalid. 

.•Reply to a bad slot. 

;An uninitialized variable was referenced. 

.-Nonexistent or incorrectly tagged object passed to typeOf. 

.•Nonexistent or incorrectly tagged object is XLATEd. 

;An object's BRAT entry is missing or mistyped. 

;An object's BRAT entry is missing. 

.-Attempt to delete a missing BRAT entry. 

.-Incorrectly tagged word used as a class. 

;A non-CST-tagged word used as an object. 

;The BRAT is full. 

.-Memory is full. 

.•Attempt to apply an incorrectly tagged word. 

,-Heap is in an inconsistent state during a compaction. 

;An object's limit is exceeded. 

.•Division by zero. 



189 



Concurrent Smalltalk on the Message-Driven Processor 



Cosmos.m 



MDP Operating System 
version 2.3 



written by 
Waldemar Horwat 



Master's thesis under Prof. William Dally 



March 28, 1989 
May 1991 



Send problems and comments to 
waldemarfchx. lcs .mit .edu. 



;; Copyright 1989, 1990, 1991 Waldemar Horwat 



INCLUDE "Cosmos. i" 

Each routine and section of code has an attribute called criticality. A criticality is 
a number between and 7 with the following meanings: 



All operations allowed. Caller's registers not preserved. 

Caller's registers preserved. May suspend, so caller's globals are not preserved. 

No suspending faults, no modification of context state. 

No suspending faults, no modification of context state, no object migration. 

No message sends, no object migration. 

No heap compaction, no message sends. 

No faults, no heap compaction, no message sends. 

No priority 1 interrupts, no faults, no heap compaction, no message sends. 



subroutines. 



A routine's criticality is no greater than the criticality of any of its components, 

or fault handlers. The criticality of a fault handler can be no greater than 5. 

Each fault handler's code starts at criticality 6 until the state in the fault registers is saved. 

Dereferencing an address register (other than A0 in absolute mode) has criticality at most 5 unless 

the code has criticality 5 and the address register is known to be valid. 

A routine that uses a global must have criticality at least 2. 

Faults that save registers should start in unchecked mode because a register might contain a CFUTure. 



80. . NNodes-1 


MODULE 
ENTRY 


ALL 








ORG 


5400 




.•Reset IPB location 




DC 


InitializeMDP- 


"+2> 


;Go initialize! 




BR 


R0 








ORG 


BRATEnd 






OSStart : 
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.■Ittttttitltttttttliiil 
.11 II 

;l* Fault Handlers II 
;tl II 

.llllllltllllllllllllli 



I Crash on a general priority or 1 or undefined system call fault. 

+ 

CrashO: HALT haltFaultO 
Crashl: HALT haltFaultl 
CrashCall: HALT haltCall 

fltCrashO - IP:abs I fault lunchecked |CrashO<<offsetN 
fltcrashl - lP:abslfault I unchecked I Crashl<<offsetN 
fltcrashcall - IP :abs I fault I unchecked I CrashCall«of fsetN 



Crash on a general future or type fault. 



CrashFuture: HALT haltFuture 
CrashType: HALT haltType 

fltCrashFuture - IP :abs I fault I unchecked |crashFuture<<of fsetN 
fltCrashType - IP :abs I fault I unchecked I CrashType<<of fsetN 



I Handle the early or send fault by re-trying the operation. 



1 

ICriticality 
1 


5. 




etryHandler : 


MOVE 


R0.FOP0 




MOVE 


FIP.RO 




ROT 


RO,-phaseN,R0 




SUB 


R0,1,R0 




ROT 


R0,phaseN,R0 




MOVE 


R0,FIP 




MOVE 


FOP0.R0 




MOVE 


FIP,IP 



.•Save RO . Criticality 6. 

.-Back up FIP by one instruction. 



.-Restore RO. 



fltEarly - IP :abs I fault I unchecked I RetryHandler<<of fsetN 
fltSend - IPiabsl fault I uncheckedlRetryHandler<<of fsetN 



I Handle a limit fault. Halt unless LimitOverride was set, in which case clear it and I 
I jump to the override routine. RO and Rl are altered when LimitOverride is used. 1 

+ + 

I 

ICriticality 5. 

I 

LimitHandler: MOVE [LimitOverride, AO] ,R0 ;Criticality 6. 

BNIL RO, ~Limit_Halt .-Halt unless LimitOverride was 3et. 

MOVE NIL, Rl .-Clear LimitOverride. 

MOVE Rl, [ LimitOverride, AO] 

MOVE RO, IP ;Go to the override routine. 

Limit_Halt: HALT haltLimit 

fltLimit - IP:abs I fault luncheckedl LimitHandler<<of fsetN 



I Handle a CFUTure fault. I 

t + 

I 

ICriticality 1. 

I 

+ 1 

I Save the current state in the context, suspend, and allocate a new fast context for I 
I the next message, various entry points are provided depending on how much save has I 
I to be saved. These routines do not return. I 

+ - * 

I 

lEntry: SaveStateID023 (IDO, ID2, ID3, and the message, if any, have to be saved.) 

lEntry: SaveStateID03 (IDO, ID3, and the message, if any, have to be saved.) 

IEntryi SaveState (No registers have to be saved.) 

I 

lUnchecked absolute non-fault mode required. 

I 

CFUT_Halt: HALT haltUninitVar ;An uninitialized variable was referenced. 

CFUTHandler: MOVE RO, [contextRO, Al ) .-Save RO and Rl . Criticality 6. 

MOVE Rl, [contextRl.Al] 

MOVE R2, [contextR2,Al] 

MOVE R3, [contextR3,Al] 

MOVE FOP0.R3 

IF DEBUG 

GT R3,0,R0 

BF RO, *CFUT_Halt .-Halt if an uninitialized variable was referenced. 

END 

MOVE R3, [contextNext,Al] 

MOVE FIP,R1 .-Back up FIP by one instruction. 

MOVE R1,F .-Criticality 2. 

ROT Rl,-phaseN,Rl 

SUB R1,1,R1 

ROT Rl,phaseN,Rl 

MOVE Rl, [contextIP,AU .-Save IP, R2, R3, IDO, ID2, and ID3 in the context. 

SaveStateID023: MOVE ID2,R0 .-save IDO, ID2, and ID3 in the current context. 



MOVE RO, icontextID2,Al) 



191 



Concurrent Smalltalk on the Message-Driven Processor 



;Save IDO and ID3 in the current context. 



SaveStateID03: MOVE IDO.RO 

HOVE RO, [contextlDO.Al] 

HOVE Q,R0 

BF RO, ~SaveState_Hsg 

HOVE 16, RO 

SUB RO, (0,A31,R0 

AND RO, lengthM, RO 

BR RO 

HOVE [15,A3],R0 

MOVE RO, US, All 

MOVE [14,A3],R0 

MOVE RO, 114, Al] 

MOVE [13,A3],R0 

MOVE RO, [13,A1] 

MOVE [12,A3],R0 

HOVE RO, [12, Al) 

HOVE [11,A3),R0 

MOVE RO, [11, Al] 

MOVE [10,A3],R0 

MOVE RO, [10, Al] 

MOVE [9,A3],R0 

MOVE RO, [9,A1] 

MOVE [ 8 , A3 ] , RO 

MOVE RO, [8,A1] 

MOVE [7,A3],R0 

MOVE RO, [7,A1] 

MOVE [6,A3],R0 

MOVE RO, [6,A1] 

HOVE [5,A3],R0 

MOVE RO, [5,A1] 

MOVE [4,A3],R0 

MOVE RO, [4,A1] 

MOVE [3,A3],R0 

MOVE RO, [3,A1] 

MOVE [2,A3],R0 

MOVE RO, 12, Al] 

MOVE ID1.R0 

MOVE RO, ID3 

SaveState_Hsg: MOVE ID3.R0 

HOVE RO, EcontextID3,Al] 

SaveState: HOVE [FastContextQueue, AO ! , RO .-Allocate a new fast context. 

BNIL RO, "AllocFastContext ; There are no more. 

XLATE RO,objectXLATE,Al 

MOVE [contextNext,Al],RO .-Unlink it. 

MOVE RO, [FastContextQueue, AO] 

SUSPEND ,-Criticality 1. 



; Check whether the message should be copied into the context. 

; Don't copy if A3 didn't point into the queue. 

.•Copy the message into the context. 

.-Jump into the appropriate place in the copy code. 



I Allocate and initialize a new fast context to be used by the next message. This 
[ routine does not return. 



I 

[Entry: AllocFastContext 

I 

lUnchecked absolute non-fault mode required. 

I 



AllocFastContext: DC OBJ:hdrLocked [contextsize 

CALL allocNextObject .-Create the context object. 

MOVE ID2,R1 .-Point Al and ID1 to the new context. 

XLATE Rl,objectXLATE,Al 

SUSPEND 



Suspend; if a slow context was used, deallocate it and replace it with a fast one. I 
This routine does not return. I 



Call: suspend 

In: AID1 Context. 

Criticality 0. 



Suspend: 


MOVE 


[ context Header, Al J ,R0 




ROT 


R0,-hdrFastN,R0 




BT 


RO, "Suspend Fast 




MOVE 


ID1.R0 




CALL 


disposeObject 




BR 


"SaveState 


Suspend Fast: 


SUSPEND 





,-Criticality 3. 

.-Check whether this was a last context. 

.-res. 

;No. Dispose this context and allocate a new one. 



fltCFUT - IP:abs| fault [ unchecked I CFUTHandler<<offsetN 
fltSuspend - IP :abs I unchecked |suspend<<of f setK 



I Handle an INVADR fault. If the object is on this node, store its address in the I 
I address register; if it is not on this node, bring it here. I 



I Criticality 1. 



INVADRHandler: 



MOVE Rl, [TempXLATE_Rl,AO] 

MOVE FIR.R1 

AND R1,3,R1 

LSH R1,2,R1 

BNZ R1.R1 

MOVE ID0,R1 

PROBE R1,R1 

BNIL Rl,~INVADR_MissO 

MOVE Rl , AO 

HOVE FIR.R1 

BNIL Rl, "INVADR Rstrt2 



.-Save Rl . Criticality 6. 

;FIR contains the correct address register number, 

.-even when FIR-NIL. 

,-Go to one of four handlers. 

.-Check the xlate cache first. 

.-Jump into the objectXLATE handler if missed. 



,-If the FIR was NIL, don't back up the FIP. 
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Cosmos Listing 



Cosmos.m 



INVADR Miss: 



INVADR Miss2: 



INVADR Restart: 



INVADR Rstrt2: 



INVADR MissO: 



INVADR_M0_2: 
INVADR Miss3: 



MOVE 


FIP.R1 


BR 


-INVADR Restart 


HALT 


haltlnvalidAl 


MOVE 


RO, [TempXLATE R0,A0] 


MOVE 


FIP.RO 


MOVE 


RO, [TempXLATE FIP.AO] 


MOVE 


R2, [TempXLATE R2.A0) 


MOVE 


R0,F 


BR 


-XLATE ToObject 


MOVE 


ID2,R1 


PROBE 


R1,R1 


BNIL 


Rl, -INVADR Miss2 


MOVE 


R1.A2 


MOVE 


FIP.R1 


BR 


-INVADR Restart 


MOVE 


ID2,R1 


BR 


-INVADR Miss 


MOVE 


ID3,R1 


PROBE 


R1,R1 


BNIL 


Rl, -INVADR Miss3 


MOVE 


R1.A3 


MOVE 


FIP.R1 


ROT 


Rl,-phaseN,Rl 


SUB 


R1,1,R1 


ROT 


Rl,phaseN, Rl 


MOVE 


R1,FIP 


MOVE 


[TempXLATE R1,A0],R1 


MOVE 


FIP, IP 


MOVE 


FIR,R1 


BNNIL 


Rl, -INVADR MO 2 


MOVE 


FIP.R1 


ROT 


Rl,-phaseN,Rl 


ADD 


R1,1,R1 


ROT 


Rl,phaseN,Rl 


MOVE 


R1,FIP 


MOVE 


ID0,R1 


BR 


-INVADR Miss 


MOVE 


ID3.R1 


BR 


-INVADR Miss 



;Save RO, R2, and FIP. 



;Criticality 5. 

.-Jump into the objectXLATE handler if missed. 

,-Check the xlate cache first. 

;Jump into the objectXLATE handler if missed. 



.-Check the xlate cache first. 

;Jump into the objectXLATE handler if missed. 



.•Restart the instruction. 



.-Advance the FIP if the FIR was NIL. 



Handle an XLATE fault. 

Two bits of the instruction are used to determine what to do. 
are: 
objectXLATE: Return an ADDR containing the object's address 
on this node, bring it here. Rs must be an ID, a DID, a 
localXLATE: If Rs represents an object on this node, return its address; if Rs is 
a constant, return NIL; otherwise, return the number of a node likelyto 
contain the object. This mode can be used only when Rd is a data register, 
must be an ID, DID, a class, a selector, or a constant. 
internalXLATE: Same as localXLATE except that treats futures as if they were 

objects instead of constants. 
restoreXLATE: Invalidate Rd by storing an invalid address there. Of course, if 
the XLATE table hits, the value associated with Rs is stored in Rd instead. 
XLATE should be made to fault on FUTures or CFUTures; this can be accomplished by 
calling XLATE in checked mode. 



The possible actions 



If the object is not 
class, or a selector. 



Rs 



The criticalities are as follows: 
objectXLATE: Criticality 1 (criticality 5 if the object is known to reside on this node) . 
internalXLATE: Criticality 5. 
localXLATE: Criticality 5. 
restoreXLATE: Criticality 5. 



XLATEHandler: 



XLATE Result: 



MOVE 


RO, [TempXLATE R0.A01 


MOVE 


Rl, [TempXLATE R1.A0] 


MOVE 


R2, [TempXLATE R2,A0) 


MOVE 


FIP,R0 


MOVE 


RO, [TempXLATE FIP.AO) 


MOVE 


FIR,R2 


MOVE 


F0P1,R1 


MOVE 


R0,F 


ROT 


R2,-9,R0 


AND 


R2,7,R2 


AND 


R0,3,R0 


BR 


RO 


BR 


-XLATE ToObject 


BR 


-XLATE Internal 


BR 


-XLATE Local 


DC 


ADDR:rel linvalid 


ROT 


R2,1,R2 


BR 


R2 


MOVE 


[TempXLATE R2,A0],R2 


MOVE 


[TempXLATE R1,A0],R1 


MOVE 


[TempXLATE FIP.AO), IP 


MOVE 


R0,R1 


MOVE 


[TempXLATE R2,A0),R2 


MOVE 


[TempXLATE R0,AO],R0 


MOVE 


[TempXLATE FIP.AO), IP 


MOVE 


R0,R2 


MOVE 


[TempXLATE R1,A0],R1 


MOVE 


[TempXLATE RO.AOl.RO 


MOVE 


[TempXLATE FIP.AO), IP 


MOVE 


R0.R3 


BR 


-XLATE R Done 


DC 





MOVE 


R0,A0 


BR 


-XLATE R Done 


DC 





MOVE 


R0,A1 


BR 


"XLATE R Done 


DC 





MOVE 


R0.A2 


BR 


-XLATE R Done 


DC 






,-save RO, Rl, R2, FIP, and FIR. Criticality 6. 



.-Criticality 5 . 

;Save the destination addressing mode in R2 . 



.-Get the object . 

;Go to the internal code. 

.-Go to the local code. 

; RestoreXLATE: Invalidate the address register. 

.-Store RO in the destination of the XLATE. Rl contains the 

.-value of Rs and is stored in the ID register. R2 contains the 

,-addressing mode from the XLATE instruction. 
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XLATE R Done: 



XLATE L TAGO: 



XLATE_ToObject: 

XLATE_Internal: 
XLATE_Local : 

XLATE L NIL: 



XLATE I SC: 



XLATE_Halt: 
XLATE L DID: 



XLATE L ID: 



XLATE L Miss: 



XLATE L 2: 



XLATE L Cxt: 



XLATE O Access: 



XLATE Rebind: DC 



MOVE R0.A3 

MOVE RI.ID3 

MOVE ( TempXLATE_R2, AO ] , R2 

MOVE (TempXLATE_Rl,AO],Rl 

MOVE (TempXLATE R0,A0),R0 

MOVE ITempXLATE^FIP.AOl.IP 

AND RO,subtagM, RO 

SUB RO,subCLASS,R0 

BZ RO, *XLATE_I_SC 

SUB RO, subSEL-SUDCLASS, RO 

BZ RO,-XLATE_I_SC 

BR *XLATE_L_NIL 

CHECK Rl.ID, RO 

BR -XLATEJDbject 

CHECK R1,FUT,R0 

BT R0,"XLATE_I_SC 

RTAG R1,R0 

BR RO 

ROT Rl,-subtagN,RO 

BR "XLATE L TAGO 

MOVE NIL,R0 

BR "XLATE_ResUlt 

MOVE NIL, RO 

BR -XLATEResult 

HALT haltXLATE 

HALT haltXLATE 

HALT haltXLATE 

HALT haltXLATE 

HALT haltXLATE 

MOVE R2, |TempXLATE_FIR,AO] 

BR ~XLATE_L_ID 

MOVE R2, [TempXLATE_FIR,AO] 

BR 'XLATE_L_DID 

MOVE NIL, RO 

BR "XLATE_Result 

MOVE NIL, RO 

BR "XLATE_Result 

HALT haltXLATE 

HALT haltXLATE 

HALT haltXLATE 

HALT haltXLATE 

MOVE Rl , R2 

CALL preferredConstituent 

PROBE R1,R0 

BNIL RO, A XLATE_L_ID 

ENTER R2,R0 

BR -XLATE_L_2 

CALL lookupBinding 

BNIL RO, "XLATE_L_Miss 

CHECK R0,INT,R2 

BT R2, "XLATE L_2 

CHECK R0,ADDR,R2" 

BF R2,"XLATE_L_CXt 

MOVE RO, [ TempXLATEJTemp, AO ] 

ROT R0,-baseN, R2 

AND R2,baseM, R2 

MOVE [R2,A0),R0 

OR R0,hdrMarked,R0 

XOR R0,hdrMarked,R0 

MOVE RO, 1R2,A0] 

MOVE | TerapXLATETemp, A0],R0 

ENTER R1,R0 

BR -XLATE_L_2 

MOVE [NodeMask,A0],R0 

AND R0,R1,R0 

MOVE NNR,R2 

EQUAL R0,R2,R2 

BT R2,-XLATE_Halt 

MOVE [TempXLATE_FIR,A0],R2 

BR "XLATE_Result 

CHECK R0,ID,R2 

BF R2,"XLATE_Halt 

MOVE Rl, [TempXLATE_Temp, AOJ 

MOVE RO, Rl 

CALL lookupBinding 

MOVE [ TempXLATE_Temp, AO ] , R 1 

CHECK R0,ADDR,R2 

BF R2, -XLATE_Halt 

ROT R0,-baseN,R2 

AND R2,baseM, R2 

MOVE contextNext,RO 

ADD R2,R0,R2 

MOVE IR2,A0),R0 

BNIL RO,-XLATE_L_Miss 

CHECK R0,INT,R2 

BF R2,"XLATE_L_Cxt 

BR -XLATE_L_2 

MOVE RO, [TempXLATE Temp,A0] 

ROT R0,-baseN, R2 

AND R2,baseM, R2 

MOVE |R2,A0],R0 

OR RO,hdrMarked,R0 

XOR R0,hdrMarked,R0 

MOVE RO, [R2,A0] 

MOVE [TempXLATE_Temp, AO] ,R0 

ENTER R1.R0 

MOVE [ TempXLATE_R2 , AO J , R2 

phase 

MOVE [TempXLATE_FIP,A01,Rl 

SUB R1,R0,R1 

MOVE Rl, [TempXLATE_FIP,AO] 

MOVE [TempXLATE_Rl,AO],Rl 

MOVE [TempXLATE_R0,AO],RO 



;If the value is a class, pretend it is an ID. 



;XLATE_Internal is the same as XLATE_Local for values which 

; aren't futures. 

.-Dispatch on the tag of the object. 

; TAGO 

;INT 

.-BOOL 

: ADDR 

;IP 

;MSG / OBJ 

; CFUT 

; FUT . 

;ID. Save the FIR. 

.-DID. Save the FIR. 

; TAGA 

.•FLOAT 

,-INSTO 

IINST1 

.'INST2 

; INST3 

:SAVe the DID. 

.-Get an ID from the DID. 

;Check if the constituent ID is in the cache. 

;If 30, enter and return it. 

.•Look for a binding of the object on 

.-this node. 

;If an integer was found, it is the object's current 

;node number. 



.•Save RO. 

.-Fetch the object's header and clear the marked flag in it. 



; Restore RO. 

.•Found such a binding. Enter it in the XLATE table. 

■•Did not find a binding. Extract the node number from 

;the ID and go return it. 

;If the node number is this node, halt because this is 

.-supposed to be the home node, yet it doesn't know where the 

.•object is. 

;Go return the result in RO. 



.-The ID is bound to a context. Dereference the context and 
; read its contextNext field. 
,-Save the object ID. 



.-Restore the object ID. 



.-Miss if it was NIL. 

,-If an integer was found, it is the object's current 

.-node number; otherwise there is another context linked. 



.•Fetch the object's header and clear the marked flag in it. 



.-Restore RO. 

.-Found a binding. Enter it in the XLATE table and 

.-restart the XLATE (or address-faulted) instruction. 



194 



Appendix F 



Cosmos Listing 



Cosmos.m 



l TempXLATEF I P , AO 1 , I P 



XLATE_0_2 : 


CHECK 


R1,TAG0,R0 




BF 


RO, "XLATE Halt 2 




ROT 


Rl,-subtagN,RO~ 




AND 


RO, subtagM, RO 




SUB 


R0,subCLASS,R0 




BZ 


RO, "XLATE ID 




SUB 


RO,subSEL-subCLASS,RO 




BZ 


R0, A XLATE_O_ID 




END 




XLATE Halt 2: 


HALT 


haltXLATE 


XLATE_Object: 


BT 


RO, -XLATE ID 




CHECK 


R1,DID,R0 




BF 


RO, "XLATE 2 




MOVE 


R1,R2 




CALL 


preferredconstituent 




PROBE 


R1.R0 




BNIL 


RO, -XLATE O ID 




ENTER 


R2.R0 




MOVE 


[TempXLATE R2,A0],R2 




BR 


-XLATE Rebind 




IF i DEBUG 


XLATE_0_2 : 








END 




XLATE_0_ID: 


CALL 


lookupBinding 




BNIL 


RO, -XLATE Miss 




MOVE 


R2, [TempXLATE Temp.AO] 




CHECK 


R0,ADDR, R2 




BT 


R2, -XLATE Access 




CHECK 


R0,INT,R2 




BT 


R2, -XLATE Point 




MOVE 


[TempXLATE Temp, AO 1 , R2 




BR 


-XLATE O Fetch 


XLATE_0_Point: 


MOVE 


[TempXLATE Temp,A0],R2 




SENDO 


RO 




DC 


MSG:msgRequestObject+3 




SENDO 


RO 




MOVE 


NNR,R0 




SEND2E0 


RI.RO 




BR 


-XLATE O Fetch 


XLATE_0_Miss: 


DC 


MSG:msgRequestObject+3 




AND 


Rl, [NodeMask, AOJ , R2 




SEND20 


R2.R0 




MOVE 


NNR,R0 




SEND2E0 


RI.RO 




EQUAL 


R0,R2,R2 




BT 


R2, -XLATE Halt 2 




MOVE 


NIL,R2 


XLATE_0_Fetch : 


MOVE 


[TempXLATE R0,A0],R0 




MOVE 


RO, [contextRO.Al] 




MOVE 


[TempXLATE R1,A0],R0 




MOVE 


RO, [contextRl.Al] 




MOVE 


[TempXLATE R2,A0],R0 




MOVE 


RO, [contextR2,Alj 




MOVE 


[TempXLATE FIP,A0),R0 




ROT 


R0,-phaseN,R0 




SUB 


R0,1,R0 




ROT 


R0,phaseN,R0 




MOVE 


RO, [contextlP.Al] 




MOVE 


R3, [contextR3,Al ] 




MOVE 


ID2,R0 




MOVE 


RO, [contextID2,Al] 




MOVE 


ID1.R0 




BNNIL 


R2, -XLATE Append 




MOVE 


R2, [contextNext,Al] 




CALL 


enterBinding 


XLATE_Suspend: 


DC 


SaveStateID03-<*+2) 




BR 


RO 


XLATE_0_Append: 


: MOVE 


[R2,A0J,R3 




MOVE 


R3, [contextNext,Al] 




MOVE 


RO, [R2,A0] 




BR 


-XLATE Suspend 



.•Classes and selectors are also objects and are 
.•treated as if they were ID'S. 



; Dispatch on the tag of the object. 



.-Save the DID. 

;Get an ID from the DID. 

.•Check if the constituent ID is in the cache. 



;If so, enter and return it. 



,-Look for a binding of the object. 

,-Save the binding's address. 

;If the object's address was found, return it. 

;If the object's current node was found, send a requestObject 
,-message there. 
R2 .-Otherwise someone is already waiting for the object. 
,-Append this context to the waiting gueue. 

;Send a message requesting the object to the object's 
; cur rent location . 



,-Sena a message requesting the object to the object's 
; home . 



.•However, if this node is supposed to be the object's home, 
,-halt because the object doesn't appear to exist. 

;R2 being NIL means no one else is waiting for the object. 
;Save state in the context. 



.•Criticality 3. 

,-Back up IP to point to the XLATE instruction. 

;Save IP, R0-R3, IDO, ID2, and ID3 in the context. 



,-Make a binding indicating that the context in IDl is 

.•waiting for the object in Rl. 

;Save the rest of the state and suspend. 

; Append the binding to the linked list headed by R2 . 



fltlNVADR - IP :abs| fault I unchecked i INVADRHandler<<offsetN 
fltXLATE - IP:abs|fault | unchecked 1 XLATEHandler<<offsetN 
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.## ## 

;ti Heap Manager #1 
;H #1 

:(t(tlllllllltl((((ll 



Copy the object pointed by A3 into the object pointed by A2 . The copy stops as soon 

as a limit fault is reached. A2 and A3 are guaranteed not to be XLATEd, so they do 

not have to correspond to the values in the ID registers. 

The objects are copied from the bottom up, so, if they overlap, the destination must 

start before the source. 

RO can be a number smaller than 32 indicating the offset of the first word in each 

object which should be copied; words with indices smaller than RO are not copied. 



Call: blockMove 



In: RO 
A2 
A3 

Critlcality 5. 

Alters R0/R1 . 



Offset of first word to copy. 
Destination object pointer. 
Source object pointer. 



BlockMove : 


MOVE 


FIP.R1 






MOVE 


Rl, [LimitOverride 


A0] 




MOVE 


R1,F 






BR 


RO 






MOVE 


[0,A3],R0 






MOVE 


RO, [0,A2] 






MOVE 


11, A3),R0 






MOVE 


RO, [1.A21 






MOVE 


[2,A3],R0 






MOVE 


RO, [2,A2] 






MOVE 


[3,A3],R0 






MOVE 


RO, [3,A2) 






MOVE 


( 4 , A3 ] , RO 






MOVE 


RO, [4.A2] 






MOVE 


(5,A3],R0 






MOVE 


RO, |5,A2] 






MOVE 


[6,A3],R0 






MOVE 


RO, [6.A2J 






MOVE 


[7,A3],R0 






MOVE 


RO, [7,A2] 






MOVE 


|8, A31.R0 






MOVE 


RO, [8,A2] 






MOVE 


(9.A31.R0 






MOVE 


RO, [9.A2] 






MOVE 


(10, A3] ,R0 






MOVE 


RO, [10, A2] 






MOVE 


[11,A31,R0 






MOVE 


RO, (11, A2] 






MOVE 


(12, A31.R0 






MOVE 


R0, (12, A2] 






MOVE 


[13, A3J.R0 






MOVE 


R0, [13.A21 






MOVE 


(14, A31.R0 






MOVE 


R0, [14, A2] 






MOVE 


(15,A3],R0 






MOVE 


R0, [15, A2] 






MOVE 


(16,A3],R0 






MOVE 


R0, [16.A21 






MOVE 


(17, A31.R0 






MOVE 


R0, [17.A21 






MOVE 


11 8, A31.R0 






MOVE 


R0, [18, A2] 






MOVE 


[19,A3],R0 






MOVE 


R0, [19.A21 






MOVE 


[20, A31.R0 






MOVE 


R0, [20, A2] 






MOVE 


[21, A31.R0 






MOVE 


R0, [21,A21 






MOVE 


[22, A3],R0 






MOVE 


RO, [22.A21 






MOVE 


[23,A3],R0 






MOVE 


R0, [23, A2] 






MOVE 


[24,A3),R0 






MOVE 


R0, [24.A21 






MOVE 


[25.A31.R0 






MOVE 


R0, [25.A21 






MOVE 


[26, A31.R0 






MOVE 


R0, [26.A2] 






MOVE 


[27.A31.R0 






MOVE 


R0, [27, A2] 






MOVE 


[28, A31.R0 






MOVE 


R0, [28.A21 






MOVE 


[29, A3],R0 






MOVE 


R0, [29, A2] 






MOVE 


[30, A31.R0 






MOVE 


R0, [30.A21 






MOVE 


1 31, A31.R0 






MOVE 


R0, [31.A21 






MOVE 


32, Rl 




BMMoveRest : 


MOVE 


[Rl, A31.R0 






MOVE 


R0, [R1.A21 






ADD 


R1,1,R1 






MOVE 


[Rl, A31.R0 






MOVE 


R0, [R1.A21 






ADD 


R1,1,R1 






MOVE 


[R1.A3J.R0 






MOVE 


R0, [R1.A21 






ADD 


Rl, 1,R1 






MOVE 


[R1,A3],R0 






MOVE 


R0, (R1,A21 






BR 


*BM MoveRest 





:Critlcality 6. 

.-Override the Limit fault for the duration of this routine. 

.-critlcality 5. 



.-Move the rest of the object . 
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fltBlockMove - IP :abs I fault luncheckedlBlockMove<<ottsetN 



Send the object pointed by A2 . The send stops as soon as a limit fault is reached. 
A2 is guaranteed not to be XLATEd, so it does not have to correspond to the value in 
ID2. RO should be one of the following: 



Words are sent starting from offset 1 in the object. 
Words are sent starting from offset 3 in the object. 
Words are sent starting from offset 5 in the object. 



I Call: blockSend 



RO 
A2 



I 

I 

ICriticality 5. 

I 

(Alters R0/R1/A2. 



Encoded offset of first word to send. 
Source object pointer. 



BlockSend: 


MOVE 


FIP,R1 




HOVE 


Rl, [LimitOverride.AO 




MOVE 


R1,F 




BR 


RO 




3END0 


U.A2J 




SENDO 


[2,A2] 




SENDO 


|3,A2] 




SENDO 


[4,A2] 




SENDO 


15, A2) 




SENDO 


(6,A2) 




SENDO 


[7,A2] 




SENDO 


[8,A2] 




SENDO 


[9,A2) 




SENDO 


[10, A2] 




SENDO 


[11, A2] 




SENDO 


[12, A2) 




SENDO 


[13, A2] 




SENDO 


[14,A21 




SENDO 


[15, A2] 




SENDO 


[16, A2] 




SENDO 


[17, A2) 




SENDO 


[18, A2J 




SENDO 


U9.A2) 




SENDO 


[20.A2J 




SENDO 


[21, A2] 




SENDO 


[22, A2] 




SENDO 


[23, A2] 




SENDO 


[24, A2] 




SENDO 


[25, A2] 




SENDO 


[26, A2] 




SENDO 


I27,A2] 




SENDO 


[28, A2) 




SENDO 


[29,A2] 




SENDO 


[30, A2J 




SENDO 


(31,A21 




MOVE 


32, RO 


BS SendRest : 


SENDO 


[R0,A2J 




ADD 


RO, 1,R0 




SENDO 


[R0.A2] 




ADD 


RO, 1,R0 




SENDO 


[R0.A2J 




ADD 


RO, 1,R0 




SENDO 


[R0.A2] 




ADD 


RO, 1,R0 




BR 


-BS SendRest 


fltBlockSend 


= IP :abs 1 fault 1 unchecked 1 Blocks 



.•Criticality 6. 

.•Override the Limit fault for the duration of this routine. 

/Criticality 5. 



; Sends more words of the object. 



Compact the node's heap, trying to free at least Rl words of memory. Halt if this j 
much memory is not available. I 



Call: compactHeap 

In: RO Number of words needed. 

Criticality 3. 

Alters R0/R1/AID2. 



CompactHeap: 



CH FlushXlate: 



INVAL 

MOVE 

MOVE 

MOVE 

MOVE 

MOVE 

MOVE 

MOVE 

BT 

MOVE 

MOVE 

MOVE 

MOVE 

MOVE 

MOVE 

MOVE 

DC 

MOVE 

MOVE 

DC 

MOVE 

ADD 



.•Criticality 6. Invalidate all relocatable address registers. 
.•Save FIP, RO, R2, R3, Q, A3, and ID3. 



RO, [TempCH_R0,A0] 

R2, [TempCH_R2,A0] 

R3, [TempCH_R3,A0] 

FIP,R0 

RO, [TempCHFIP.AOJ 

R0,F .-Criticality 3. 

Q, RO 

RO, *CH_0 

ID3.R0 

RO, [TempCH_ID3,A0) 

A3,R0 

RO, [TempCH_A3,A0] 

NIL.R3 ;R3 will contain NIL for the duration of the xlate flush. 

R3, [TempCH_Lock, A0] .-Indicate that this is the first time the heap is compacted. 

R3,Q .-Disable queue wraparound. 

IP :abs | unchecked !CH_2<<of f setN 

RO, [LimitOverride.AO] .'Override the Limit fault for the duration of this routine. 

-1,R2 

ADDR: <XlateStart<<basoN) +XlateEnd-XlateStart 

R0,A2 

R2,2,R2 .-Check every entry in the XLATE table whether it contains 
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CH_2: 
CH_Corapact2 : 



CH_Die : 
CH_Compact : 



CH Done: 



CH AlValid: 



CH_DoneQ: 
CH_Done2 ; 

CH MarkLive: 



MOVE 

CHECK 

BF 

AND 

BZ 

MOVE 

BR 

MOVE 

MOVE 

BR 

MOVE 

ROT 

BT 

ROT 

BT 

ROT 

BF 

ADD 

MOVE 

CALL 

MOVE 

AND 

ADD 

GE 

BF 

EQUAL 

BF 

MOVE 

MOVE 

SUB 

GE 

BT 

MOVE 

MOVE 

MOVE 

BNIL 

HALT 

MOVE 

ROT 

BF 

MOVE 

XLATE 

MOVE 

MOVE 

MOVE 

EQ 

BT 

MOVE 

MOVE 

BR 

MOVE 

MOVE 

MOVE 

MOVE 

ROT 

BF 

OR 

MOVE 

AND 

ROT 

ADD 

MOVE 

ROT 

ADD 

ADD 

ADD 

EQUAL 

BT 

MOVE 

MOVE 

MOVE 

CALL 

CHECK 

BF 

MOVE 

OR 

MOVE 

MOVE 

CALL 

MOVE 

BR 



[R2,A2],R0 

R0,ADDR, Rl 

Rl,~CH_Flushxlate 

RO.rel.Rl 

Rl, ~CH_FlushXlate 

R3, (R2.A2) 

-CH_FlushXlate 

[HeapStart,A0],R2 
R2,R3 

A CH_Compact 

I R2 , AO ] , RO 

RO, -hdrLockedN, Rl 

Rl,-CH_Live 

RO, -hdrDeletedN, Rl 

Rl,-CH_Die 

RO,-hdrMarkedN,Rl 

Rl, A CH_MarRLive 

R2, 1,R1 

[R1,A0],R1 
deleteBinding 

[R2,A0),R0 

R0,hdrLenqthM,RO 

R2,R0,R2 

R2, [FirstFree,A0],R0 

R0,"CH_Compact2 

R2, (FirstFree, A01,R0 

RO, "CH_AliqnError 

R3, (FirstFree, AO] 

(LastFree,A0),R0 

R0,R3,R0 

RO, [TempCH_R0,A01,R0 

RO, -CH_Done 

[TempCH_Lock, AO ] , RO 
TRUE,R1 

Rl, [TempCH_Lock,AO] 

R0,"CH_2 

haltMemFull 

A1.R0 

RO, -invalid, RO 

RO.-CHAlValid 

ID1,R0 
RO,objectXLATE,Al 

[TempCH_A3,A0],R0 
R0,A3 

|TempCH_ID3,A0] ,R0 
R0,TRUE,R1 
Rl,~CH_DoneQ 
RO, ID3 

(TempCH_R2,A0),R2 

-CH_Done2 
R0,Q 

[TempCH_R2,A0],R2 

[TempCH_R3,A0J,R3 

[TempCH_FIP,AOJ, IP 

RO, -hdrPurgeableN, Rl 

Rl,"CH_Live 

R0,hdrMarked,R0 

RO, [R2,A0] 

RO,hdrLengthM,RO 

R3,baseN,Rl 

R1,R0,R1 

R1,A2 

R2,baseN,Rl 

R1,R0,R1 

R2,R0,R2 

R3,R0,R3 

R2,R3,R0 

RO, A CH_Compact 

R1.A3 

[objectID,A3),Rl 

R2, [TempCH Src,A0] 

lookupBindlng 

R0,ADDR,R1 

Rl,-CH_BRATError 

A2.R0 

R0,rel,R0 

RO, [R2,A01 

0,R0 

blockMove 

lTempCH_Src,A0],R2 

"CH_Compact 



:a relocatable ADDR. If it does, replace it with NIL. 



;R2 is the source heap scanner. 

;R3 is the destination heap scanner. 

.-Get the next object at the source. 
.•Let it live if it is locked. 

.-Kill it if it is deleted. 

.■Purge it if is is marked. 

.-Read the object's ID into Rl. 

;No need to purge the xlate table. 

;5kip the source heap scanner past the removed object. 

.•Check whether the entire heap was scanned. 

;If so, then R2 must match FirstFree exactly. 

;Update FirstFree. 

,-Check whether there is now enough room to satisfy the allocation 

.•request. 

.•Leave if so. 

; I f not, compact the heap again unless it was just compacted. 

.•Give up if two successive compactions don't free enough space. 
.-Make sure that Al is valid. 

; I f not, re-xlate it. 
.-Restore A3, Q, and ID3. 



.-Restore R2 . 

.■Restore R3 and return. 



;lf this object is purgeable, mark it so that it will be purged 
;on the next scan. 



.•Store the length of the object in RO. 
.■Point A2 to the destination object. 



.•Point A3 to the source object. 

.-Advance the source and destination scanners. 

.-There is no need to move an object if the source and destination 
.■addresses are the same. 

; Update the object's binding in the BRAT. 
.-Save R2. 

.-Make sure that the binding is an ADDR. 



.-Move the object to its destination location. 
.■Restore R2. 



CH_AlignError: 

CH_BRATError: 

fltCompactHeap 



HALT haltHeap 
HALT haltBRATType 
■ IP:abslfaultl unchecked I Compact Heap<<of f setN 
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I Allocate and initialize a new heap object. RO contains the word to be stored as the I 
I first word of the object. The length is extracted from RO, and the flags in the I 
I high bits of RO should be set to benign values. Rl contains the ID for the object. I 
I The object is not entered in the XLATE and the BRAT tables. I 

+ + 

I 

I Call: allocObject 

I 

| In: RO First word of object. 

I Rl ID of the object. 

I 

I Out: AID2 "Object. 

I RO ADDR pointing to object. 



I 

ICriticality 3. 

I 

I Alters R0/R2/R3/AID2. 



AllocObject: 


MOVE 


R0,R3 




MOVE 


R1.ID2 


AO_Retry: 


MOVE 


|FirstFree,AO],Rl 




AND 


R3, hdrLengthM, R2 




ADD 


R1,R2,R2 




MOVE 


[LastFree,A0J,R0 




SUB 


R0,3,R0 




GT 


R2,R0,R0 




BF 


RO.-AO 1 




AND 


R3, hdrLengthM, RO 




ADD 


R0,3,R0 




MOVE 


FIP,R2 




MOVE 


R2,F 




CALL 


compactHeap 




MOVE 


TRUE,R0 




MOVE 


R0,F 




MOVE 


R2.FIP 




BR 


~AO Retry 


AOl: 


MOVE 


R2, [FirstFree.AO] 




ROT 


Rl,baseN,Rl 




AND 


R3, hdrLengthM, RO 




OR 


R1,R0,R0 




OR 


R0,rel,RO 




WTAG 


R0,ADDR,R0 




MOVE 


R0,A2 




MOVE 


ID2,R1 




MOVE 


R3, lobjectHeader,A2] 




MOVE 


Rl, lobjectID,A2] 




MOVE 


FIP, IP 



;Save RO. Criticality 6. 
.•Store the object's ID in ID2. 
.■Advance the heap scanner. 



.•Always leave three words on the heap in case a BRAT entry 
/needs to be allocated for this object. 
.-Check whether the heap overflowed. 

,-If it did, compact the heap, telling the compactor that 

;at least three plus the length of the object words are needed. 

.-Criticality 3. 



.-Criticality 6. 

;Go try the allocation again. 

;Create a base/address pair for the object. 

.-Mark the object as relocatable. 
.-Store a pointer to the object in A2 . 
,-Write the object's header and ID. 



fltAllocObject - IP:abs| fault | unchecked I AllocObject<<offsetN 
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KIMIIIIIXOtlllll 
#• II 

li BRAT Manager II 
II II 

IMIMIMIIMIHMII 



Enter a binding of Rl to RO in the BRAT. The BRAT should not have an existing 
binding of Rl . It may also be appropriate to enter the object into the xlate table. 



Call: enterBinding 



In: RO 
Rl 



Data. 
Key. 



Criticallty 3. 
Alters R0-R3/AID2. 



EnterBinding: 


MOVE 


R0,R3 




MOVE 


Rl, [TempEB Key,A0] 


EB_1: 


ROT 


Rl , -BRATLenLog M , R2 




XOR 


R1,R2,R2 




ROT 


R2 , -BRATLenLog* 2, RO 




XOR 


R2,R0,R2 




ROT 


R2, -BRATLenLog, RO 




XOR 


R2,R0,R2 




MOVE 


BRATLength-l,RO 




AND 


R2,R0,R2 




DC 


BRATStart 




ADD 


R2,R0,R0 




MOVE 


[BRATFree,A0),R2 




BNIL 


R2,"EB BRATFull 


EB2: 


MOVE 


[TempEB Key,A0],Rl 




MOVE 


Rl, [R2.A0] 




MOVE 


[R0,A0),R1 




MOVE 


R2, [R0,A0] 




ADD 


R2,1,R2 




MOVE 


R3, [R2,A0] 




ADD 


R2,1,R2 




MOVE 


tR2,A0],R0 




MOVE 


RO, [BRATFree.AOJ 




MOVE 


Rl, (R2,A0) 




MOVE 


FIP, IP 


EB_BRATFull: 


MOVE 


(LastFree,A0],R2 




SUB 


R2,3,R2 




GE 


R2, (FirstFree,AO),Rl 




BF 


R1,-EB HeapFull 




MOVE 


R2, [LastFree,A0| 




ADD 


R2,2,R2 




MOVE 


NIL,R1 




MOVE 


Rl, [R2.A0] 




SUB 


R2,2,R2 




BR 


*EB_2 


EBHeapFull: 


MOVE 


FIP.R2 




MOVE 


3,R0 




MOVE 


R2,F 




CALL 


compactHeap 




MOVE 


TRUE.RO 




MOVE 


R0,F 




MOVE 


R2,FIP 




MOVE 


[TempEB Key,A0],Rl 




BR 


-EB 1 



;Criticality 6. Save data in R3 . 

.-Save the key. 

; Calculate the hash code for the key. 

;The hash code is the XOR of the BRATLenLog-bit fields of 

;the key. 



;R2 contains a hash code between and BRATLength-1 . 

;R0 points to the head of the BRAT chain. 

;R2 points to a free BRAT link. 

; Compact the heap if the BRAT is full. 

.-Store the key in the link. 

;Save the second link in the chain in Rl . 

;Hake this link be the first in the chain. 

.•Store the data in the link. 

;Put the next free link in BRATFree. 

; Link with the second link in the chain. 



.•Attempt to allocate three words from the back of the heap. 

;Go compact the heap if it was full. 

.•Store a NIL in the link word of the new entry and go 
,-allocate this entry in the BRAT. 



;Save the FIP. 

;At least three free words are needed on the heap. 

;criticality 3. 



;Criticality 6. 

.-Restore the FIP. 

.•Restore the key and go back to the beginning. 



fltEnterBinding - IP:abs 1 fault I unchecked I EnterBlnding<<of fsetN 



I Lookup a binding of Rl in the BRAT. Return the binding or NIL if there isn't any. I 
I Also return the absolute address of the binding in the BRAT so that it can be I 
I modified. I 



I Call ; lookupBinding 

I 

I In: Rl Key. 



| Out: 
I 



RO 
R2 



I 

ICriticality 5. 

I 

lAlters R0/R2. 



Data or NIL if none. 

Absolute address of data in the BRAT (valid only when ROONIL) . 



LoolcupBindi 


ng : ROT 


Rl, -BRATLenLog*4,R2 




XOR 


R1,R2,R2 




ROT 


R2, -BRATLenLog*2, RO 




XOR 


R2,R0,R2 




ROT 


R2, -BRATLenLog, RO 




XOR 


R2,R0,R2 




MOVE 


BRATLength-1, RO 




AND 


R2,R0,R2 




DC 


BRATStart-2 




ADD 


R2,R0,RO 


LB Next: 


ADD 


R0,2,R0 




MOVE 


[RO,A01,RO 




BNIL 


RO, *LB Done 




EQ 


Rl, [R0,A0|,R2 




BF 


R2,-LB Next 




ADD 


RO, 1,R2 




MOVE 


IR2,A0),R0 




MOVE 


FIP, IP 



;Criticality 6. 

.-Calculate the hash code for Rl. 

;The hash code is the XOR of the four bytes of Rl, 

;the same as the XLATE hash code. 



;R2 contains a hash code between and BRATLength-1. 



.-Follow the linked list of BRAT entries starting with 

;the one in RO. Leave if RO is NIL. 

.•Compare the key against Rl . 

,-Check the next entry if it doesn't match. 

.■Otherwise return this entry's data. 
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LB_Done: MOVE FIP, IP 

fltLookupBinding - IP:absl fault I unchecked 1 LookupBind ing<<o f f set N 



I Delete a binding of Rl In the BRAT. Halt if no such binding existed. 

I The purgeBinding entry point also purges the binding from the xlate table. 



I Call: deleteBinding 
I Call: purgeBinding 



I In: 
I 



Key. 



ICriticality 5. 
I 



; 1 Alters RO . 
; 1 






PurgeBinding: 


MOVE 


NIL,R0 




ENTER 


R1,R0 - 


DeleteBinding : 


MOVE 


R2,FOP0 




MOVE 


R3.FOP1 




ROT 


Rl,-BRATLenLog"4,R2 




XOR 


R1,R2,R2 




ROT 


R2, -BRATLenLog"2, RO 




XOR 


R2,R0,R2 




ROT 


R2,-BRATLenLog,R0 




XOR 


R2,R0,R2 




MOVE 


BRATLength-l,RO 




AND 


R2,R0,R2 




DC 


BRATStart-2 




ADD 


R2,R0,R2 


DB_Next : 


ADD 


R2,2,R0 




MOVE 


[R0,A0],R2 




BNIL 


R2,"DB Halt 




EQ 


Rl, [R2,A0),R3 




BF 


R3, A DB Next 




ADD 


R2,2,R2 




MOVE 


[R2,A0],R3 




MOVE 


R3, [RO.AO] 




MOVE 


[BRATFree,A0),R3 




MOVE 


R3, !R2,A0) 




SUB 


R2,2,R2 




MOVE 


R2, [BRATFree.AOJ 




MOVE 


FOPl,R3 




MOVE 


FOP0,R2 




MOVE 


FIP, IP 


DB Halt: 


HALT 


haltBRATDelete 



;Criticality 6. Purge the object's binding from the XLATE table. 
.■Criticality 6. Save R2 and R3 . 



.•Calculate the hash code for Rl. 

;The hash code is the XOR of the four bytes of Rl, 

;the same as the XLATE hash code. 



;R2 contains a hash code between and BRATLength-1 . 



.•Follow the linked list of BRAT entries starting with 

;the one in RO. Leave if RO is NIL. 

.•Compare the key against Rl . 

.-Check the next entry if it doesn't match. 

.•Otherwise delete this entry. 



fltPurgeBinding - IP :abs I fault I unchecked | PurgeBinding<<of f setN 
fltDeleteBinding - IP :abs I fault I unchecked I Dele teBinding<<of f set N 
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ti»iiitt«tttittttt«t«ti*tmiH*«i 
it »♦ 

## Object and Context Manager ## 
II »l 

lllltlH«*ltlllltttllll«ttt««fllll 



Allocate and initialize a new object on the local heap and enter it in the XLATE and 
BRAT tables. RO contains the class of the object. 



Call: newLocalObject 

In: RO Object's class. 

Out: RO Object's ID. 

Criticality 1. 

Alters R0-R3/AID2. 



Allocate and initialize a new object on the local heap and enter it in the XLATE and 
BRAT tables. RO contains the word to be stored as the first word of the object. 
The length is extracted from RO, and the flags in the high bits of RO should be set 
to benign values. The object gets the next unused ID. 



Call: allocNextObject 

In: RO First word of object. 

Out: AID2 -Object. 

RO Object's ID. 

Criticality 3. 

Alters R0-R3/AID2. 



Allocate and initialize a new object on the local heap and enter it in the XLATE and 
BRAT tables. RO contains the word to be stored as the first word of the object. 
The length is extracted from RO, and the flags in the high bits of RO should be set 
to benign values. Rl contains the ID to be used for the object. 



Call: allocNewObject 

In: RO First word of object. 
Rl Object's ID. 

Out: AID2 -Object. 

RO Object's ID. 

Criticality 3. 

Alters R0-R3/AID2. 



NewLocalObject: MOVE 
MOVE 

XLATE 

MOVE 

MOVE 

ADD 

ADD 

MOVE 

BR 

AllocNextObject: MOVE 
ADD 
ADD 
MOVE 

AllocNewObject: MOVE 
MOVE 

ANO_2 : MOVE 

CALL 
ENTER 
CALL 
MOVE 
MOVE 



FIP.R2 

R2,F 

R0,objectXLATE,A2 

[oClassWord,A2],R0 

[LastObjectID,A0],Rl 

Rl, (l«serialN)-l,Rl 

Rl, 1,R1 

Rl, [LastObjectID,A0] 

-ANO_2 

[LastobjectID,AO] , Rl 

Rl, (l«serialN)-l,Rl 

R1,1,R1 

Rl, (LastObjectID,A0] 

FIP.R2 

R2,F 

R2, [TempANO_FIP,A0] 

allocObject 

R1,R0 

enterBinding 

ID2.R0 

[TempANO FIP.AO], IP 



.•Criticality 6. 
.-Criticality 1 . 
;Get the object's first word. 

.-Get the next object ID. 
.-Advance the object ID counter. 
.-Criticality 6. Get the next object ID. 



.-Advance the object ID counter. 
.-Criticality 6. Save FIP. 
.-Criticality 5. 

.-Allocate the object. 

.-Put it into the xlate cache and the BRAT table. 

.-Load the object's ID into RO . 



f ltNewLocalObject - IP : abs I fault I unchecked I NewLocalObject«of f setN 
fltAllocNextObject - IP : abs I fault I unchecked I AllocNextObject<<of f setN 
fltAllocNewObject - IP :abs[ fault I unchecked I All ocNewObject<<of f setN 



Allocate and initialize a new context. If ID1 is non-NIL on entry, it points to a I 

context that should be deallocated: however, if A3 no longer points to the message, I 

before that context is deallocated, its locals in locations 2 through 15, inclusive, I 

are copied into the new context. I 



Call: newContext 

In: RO First word of context, including desired length. 
AID1 -context or NIL if none already exists. 

Out: AID1 New context. 

Criticality 2. 

Alters R0/R1/AID1. 
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NewContext : MOVE 


FIP.R1 


MOVE 


Rl, [TempNC FIP.AO] 


MOVE 


R1,F 


MOVE 


R2, (TempNC R2,A0] 


MOVE 


ID2.R1 


MOVE 


Rl, [TempNC ID2.A0J 


MOVE 


R3, ITempNC R3,A0] 


CALL 


allocNextoBject 


MOVE 


ID1.R0 


BNIL 


RO, "NC NoOldCxt 


MOVE 


Q, RO 


BT 


RO.-NC HadMessage 


MOVE 


[2,A11,R0 


MOVE 


RO, [2,A2] 


MOVE 


[3,A1],R0 


MOVE 


RO, (3, A2] 


MOVE 


(4,A1],R0 


MOVE 


RO, (4.A21 


MOVE 


|S,A1],R0 


MOVE 


RO, [5.A2] 


MOVE 


(6,A1],R0 


MOVE 


RO, [6,A2] 


MOVE 


[7.AU.R0 


MOVE 


RO, (7,A2) 


MOVE 


18.A11.R0 


MOVE 


RO, [8,A21 


MOVE 


[9.AU.R0 


MOVE 


RO, [9.A2] 


MOVE 


[10,A11,R0 


MOVE 


RO, 110, A2] 


MOVE 


[11,A1],R0 


MOVE 


RO, [11,A21 


MOVE 


[12.A11.R0 


MOVE 


RO, [12, A2] 


MOVE 


[13,A1],R0 


MOVE 


RO, [13,A2J 


MOVE 


(14.A11.R0 


MOVE 


RO, [14.A21 


MOVE 


[15.A11.R0 


MOVE 


RO, [15.A21 


NC HadMessage: CALL 


disposeContext 


NC NoOldCxt: MOVE 


ID2,R1 


XLATE 


Rl,objectXLATE,Al 


MOVE 


[TempNC ID2,A01,R2 


XLATE 


R2,restoreXLATE,A2 


MOVE 


[TempNC R2,A01,R2 


MOVE 


[TempNC R3,A0),R3 


MOVE 


[TempNC FIP.AO), IP 



.•Criticality 6. Save R2, R3 and the FIP. 

,-Criticality 3. 

.•Save ID2 in TempNC_ID2. 

.•Create the context object. 



;If A3 did not point to a message, copy the old context' 
.•locals into the new context. 



.-Then dispose the old context. 
.•Point Al and ID1 to the new context. 



.■Restore A2 and ID2. 
.-Restore R2 and R3 . 



fltNewContext - IP :abs I fault I unchecked I NewContext«of f setN 



I Deallocate a context, which may be either a fast context or a heap context. I 

+— + 

I 

I Call: disposeContext 

I 

I In: AID1 Context. 

I 

ICriticality 3. 

I 

I Alters R0-R2/AID1. 

I 

+ 1 

I Dispose an object. If the object is locked, it is deleted as soon as the unlock I 

I message comes in. ' 

+ + 

I 

I Call: disposeObject 

I 

I In: RO Object. 

I 

ICriticality 3. 

I 

I Alters R0-R2. 

I 

DisposeFastContext : MOVE [FastContextQueue, A01 , RO ,-Criticality 6. 

MOVE RO, [contextNext, All .-Put the context back on the context queue. 



.■Criticality 6. Check whether this was a fast context. 

;Yes. 

;No. Deallocate a normal object. 

.•Criticality 6. 

,-Criticality 3. 

.-Get the object location into Rl. 

.-Exit if the object was a constant. 

.•Save the FIP. 

;Go send a Dispose message if the object is remote. 
.-Enter unchecked mode. 





MOVE 


ID1,R0 




MOVE 


RO, [FastContextQueue, A 




MOVE 


FIP, IP 


DisposeContext 


MOVE 


[contextHeader, Al 1 , RO 




ROT 


R0,-hdrFastN,R0 




BT 


RO, "DisposeFastContext 




MOVE 


ID1,R0 


DisposeObject : 


MOVE 


FIP.R2 




MOVE 


R2,F 




XLATE 


R0,lOCalXLATE,Rl 




BNIL 


Rl,~DO Done 




MOVE 


R2, (TempDO FIP,A01 




CHECK 


R1,INT,R2 




BT 


R2, "DO Remote 




MOVE 


TRUE.R2 




MOVE 


R2,U 




MOVE 


ID2,R2 




MOVE 


R0.ID2 




MOVE 


R1,A2 




MOVE 


[objectHeader,A21,Rl 




ROT 


Rl,-hdrLockedN,Rl 




BT 


Rl.-DO Locked 




AND 


RO, [NodeMask,A0],Rl 




MOVE 


NNR.RO 




EQUAL 


R0,R1,R0 




BT 


RO, "DO Home 



;If the object is local, point AID2 to it. 

.•Can't delete a locked object. 



.-Check whether this is the object's home. 
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,-If not, send a message to the object's home to delete 
.-its DRAT entry. 



DO Home: 



DO_Done : 
DO Remote: 



DO Locked: 



.'Deallocate it. 
; Restore AID2. 



;Send a Dispose message to the object's node. 



MOVE ID2,R1 

DC MSG:msgDisposeBRAT+2 

SEND20 R1,R0 

SENDEO Rl 

CALL deallocateObject 

XLATE R2,restoreXLATE,A2 

MOVE [TempDO_FIP,A0),IP 

MOVE R2.IP 

MOVE R0.R2 

SENDO Rl 

DC MSG:msgDispose+2 

SEND2E0 R0.R2 

MOVE [TempDO_FIP,A0] , IP 

ROT Rl,hdrLockedN-hdrDeletedN,RO 

OR R0,1,R0 :If the object is locked, mark it as deleted but do not 

ROT RO,hdrDeletedN,RO ; delete it yet. 

MOVE RO, [objectHeader,A2] 

MOVE [TempDO_FIP,A0),IP 

fltDisposeContext - IP:absl fault I unchecked I DisposeContext<<of fsetN 
ntDisposeObject - IP :abs I fault IDisposeObject<<of fsetN 



Execute a Dispose message 
Dispose: 



MOVE [disposeID,A3],R0 
CALL disposeObject 
SUSPEND 



;Criticality 2. 
.•Dispose the object. 



msgDispose - Dispose«of fsetN 



I Execute a DisposeBRAT message. If the object was present on its home node, it is I 
I disposed; otherwise, only the object's home BRAT entry is deleted. I 

+ + 

DisposeBRAT: MOVE 1 disposeBRATID, A3 ] , Rl : Cr it icality 2. 

XLATE Rl, localXLATE,R2 .-check if the object is here too. 

CHECK R2,ADDR,R3 

BT R3.-DBRAT Here ;If so, dispose it. 

CALL purgeBindTng .-Purge the object's binding from the XLATE table and 

SUSPEND .-from the BRAT. 
DBRAT_Here: MOVE R1,R0 

CALL disposeObject .-Dispose the object. 

SUSPEND 



msgDisposeBRAT - DisposeBRAT<<of fsetN 



I Deallocate an object residing on this node. The object must not be locked. 

+ 

I 

ICall: deallocateObject 

I 

I In: AID2 Object. 

I 

ICriticality 4. 

I 

I Alters R0/R1. 

I 

DeallocateObject: MOVE FIP.R1 

MOVE Rl, ITempDealloc FIP,A0] 

DC OBJ:hdrDeleted 

MOVE R1,F ,-Criticality 4. 

OR RO, [objectHeader,A2),R0 .-Set the deleted flag in the object header. 

MOVE RO, [objectHeader,A2] 

MOVE [objectID,A2] ,R1 .-Delete the object's binding from the BRAT and the xlate table. 

CALL purgeBinding 

MOVE NIL, RO 

MOVE RO, ID2 .-Clear ID2. 

MOVE [TempDeallocFIP.AO], IP 

fltDeallocateObject - IP:absl fault I unchecked I DeallocateObject<<of fsetN 



.-Criticality 6. Save the FIP. 
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itittiit«i*ii<<<««ii«t<#f#<ii 
## >* 

• I Global Object Manager 41 
II II 

IIIIKIIIIIIIIIIIIIIIIIIIIIII 



Allocate and initialize a new object on the heap of a random node and enter it in I 
that node's XLATE and BRAT tables. RO contains the class of the object. I 



I 

I Call: newObject 

I 



I Out: RO 

I 

ICriticality 1. 

I 

I Alters R0/R1. 

I 

NewObject : 



Object's class. 
Object's ID. 



MOVE 


FIP.R1 


MOVE 


R1,F 


MOVE 


Rl, |contextIP,Al] 


MOVE 


R2, [contextR2,AU 


MOVE 


R3, [contextR3,Al] 


MOVE 


R0,R3 


DC 


RandomSeedlncrement 


MOVE 


[RandomSeed,AO] ,R2 


ADD 


R2,R0,R2 


MOVE 


R2, [RandomSeed, AO] 


AND 


R2, |NodeMask,A0],R2 


DC 


MSG:msg NewObject +4 


SEND20 


R2.R0 


MOVE 


ID1.R2 


SEND20 


R3,R2 


SENDEO 


contextRO 


DC 


CFUT:contextRO 


MOVE 


RO, [contextNext,Al) 


MOVE 


RO, [contextRO, Al] 


DC 


SaveStateID023-c*2) 


BR 


RO 



;Criticality 6. 
;Criticality 3. 
.-Save the state in the context. 



.-Save the class in R3 . 



.-Generate a random node number. 

; Advance the random node counter and return its new value. 



.•Send a NewObject message to that node. 



.-Tell the context to wait for the quasi-cfuture in RO. 

.•Store a cfuture in RO. 

;Go save the context and suspend. 



fltNewObject - IP:abs I fault lunchecked I NewObject<<of f setN 



I Execute a NewObject message. Return the object's ID to the caller. 



NewObjectM: MOVE [newObjClass, A3] , RO 

CALL newLocalObject 

MOVE R0.R2 

DC MSG:msgReply+4 

MOVE [newObjReplyID,A3],Rl 



.-Criticality 0. 

.•Allocate the object locally. 



.-Reply with the object's ID. 



SEND20 R1.R0 

SENDO Rl 

SEND2E0 [newObjReplySlot,A3),R2 

SUSPEND 



msgNewObject - NewObjectM<<of f setN 



I Return the class of an object. TypeOf returns the class as an integer, while 
I classOf wraps it as a class. The argument of TypeOf must not be a future. 



ICall: ClassOf 
I Call: typeOf 



I Out: 
I 



RO Object . 

RO The object's class. 



ICriticality 1. 

I 

lAlters R0/R1/AID2. 



ClassOf: 


MOVE 


FIP.R1 




MOVE 


R1,F 




BNNIL 


R0,-TOf 2 


TOf 2: 


MOVE 


Rl, ITempTOf FIP.AO] 




CALL 


typeOf 




MOVE 


subCLASS,Rl 




ROT 


Rl,subtagN,Rl 




OR 


R0,R1,R0 




WTAG 


R0,TAG0,R0 




MOVE 


ITempTOf FIP.AO] , IP 



.-Criticality 6. Save the FIP. 

; Criticality 1 . 

; Force the argument. 

;Save the FIP in memory. 

;Get the integer type and write its tag and subtag. 



fltClassOf - IP:abs|fault|ClassOf<<of fsetN 



TypeOf : 



RTAG 


R0,R1 


••Criticali 


BR 


Rl 




ROT 


R0,-subtagN,Rl 


;TAG0 


BR 


-TOf TAGO 




BR 


-TOf Integer 


;INT 


BT 


RO, "TOf True 


;BOOL 


BR 


-TOf False 




HALT 


haltTypeOf 


.' ADDR 


HALT 


haltTypeOf 


;IP 


HALT 


haltTypeOf 


:MSG / OBJ 


HALT 


haltTypeOf 


; CFUT 


HALT 


haltTypeOf 


;FUT. 



Criticality 6. Dispatch on the tag of the object. 
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MOVE 


FIP,R1 




BR 


-TOf Object 




MOVE 


FIP.R1 




BR 


-TOf Object 




HALT 


haltTypeOf 




BR 


-TOf_Float 




HALT 


haltTypeOf 




HALT 


haltTypeOf 




HALT 


haltTypeOf 




HALT 


haltTypeOf 


TOf_Integer: 


DC 


classlnteger 




MOVE 


FIP, IP 


TOf_True : 


DC 


classTrue 




MOVE 


FIP, IP 


TO£_False: 


DC 


classFalse 




MOVE 


FIP, IP 


TOf_Float : 


DC 


classFloat 




MOVE 


FIP, IP 


TOf Object: 


MOVE 


R1,F 




XLATE 


RO, objectXLATE, A2 




MOVE 


I ob jectHeader, A2 ] , RO 




WTAG 


R0,INT,R0 




ROT 


R0,-hdrClassN,R0 




AND 


R0,hdrClassM,R0 




MOVE 


Rl.IP 


TOf_TAG0 : 


AND 


Rl,subtagM,Rl 




BR 


Rl 




BNNIL 


R0,*TOf Symbol 




BR 


-TOf NIL 




MOVE 


FIP.R1 




BR 


-TOf Object 




BR 


*TOf Selector 




BR 


-TOf_character 




HALT 


haltTypeOf 




HALT 


haltTypeOf 




HALT 


haltTypeOf 




HALT 


haltTypeOf 




HALT 


haltTypeOf 




HALT 


haltTypeOf 




HALT 


haltTypeOf 




HALT 


haltTypeOf 




HALT 


haltTypeOf 




HALT 


haltTypeOf 




HALT 


haltTypeOf 




HALT 


haltTypeOf 


TOf_Symbol : 


DC 


classSymbol 




MOVE 


FIP,IP 


TOf_NIL: 


DC 


classNull 




MOVE 


FIP,IP 


TOf_Selector: 


DC 


classSelector 




MOVE 


FIP,IP 


TOf_Character : 


DC 


classcharacter 




MOVE 


FIP,IP 



;ID. Save the FIP. 

■•DID. Save the FIP. 

; TAGA 

; FLOAT 

;INST0 

;INST1 

;INST2 

;INST3 

.•Return the integer class. 

.•Return the true class. 

.-Return the false class. 

; Return the float class. 

.•Critlcality 1. 

.•Extract the class from the object header. 

;R0 now contains the class. 

.•Dispatch on the subtag. 

,-subSYM 

;subCLASS. Save the FIP. 

; subSEL 
; subCHAR 



.•Return the symbol class. 

.•Return the null class. 

.•Return the selector class. 

.■Return the character class. 



fltTypeOf - IP:abs| fault | unchecked I TypeOf<<offse 



tN 



Return the node on which the object might reside. If the object is a constant, I 
return a random node number. If the object is a DID, return a random constituent. I 



Call: objectNode 

In: RO Object. 

Out: Rl 



Number of node likely to contain object. The number may not necessarily be 
tagged INTeger, and it may contain junk data in the high 16 bits. 



Critlcality 5. 
Alters R0/R1. 



DbjectNode: 


RTAG 




BR 




MOVE 




BR 




MOVE 




BR 




MOVE 




BR 




HALT 




HALT 




HALT 




HALT 




MOVE 




MOVE 




MOVE 




MOVE 




MOVE 




BR 




MOVE 




BR 




MOVE 




BR 




HALT 




HALT 




HALT 




HALT 


ON_Random: 


DC 




ADD 




MOVE 




AND 




MOVE 



R0,R1 
Rl 

[RandomSeed, AOj , Rl 
~ON_Random 
[RandomSeed, AO] ,R1 
*ON_Random 
[RandomSeed, A0],R1 
~ON_Random 
haltlnternalType 
haltlnternalType 
haltlnternalType 
haltlnternalType 
R0,R1 
FIP, IP 
R0,R1 
FIP, IP 
R0,R1 

*RandomConst 
[RandomSeed, AO] ,R1 
~ON_Random 
[RandomSeed, AO] , Rl 
"ON_Random 
haltlnternalType 
haltlnternalType 
haltlnternalType 
haltlnternalType 
RandomSeed Increment 
R1,R0,R1 

Rl, [RandomSeed, AO] 
Rl, [NodeMask,AO],Rl 
FIP, IP 



.•Criticality 6. 

.•Dispatch on the tag of the object. 

; TAGO 

; INT 

;BOOL 

; ADDR 

;IP 

;MSG / OBJ 

; CFUT 

;FUT. Return the FUT; the node number is in the low 

;16 bits. 

;ID. Return the ID; the node number is in the low 

; 16 bits. 

.-DID 

; TAGA 

; FLOAT 

; INSTO 
;INST1 
; INST 2 
; INST 3 

.-Advance the random node counter and return its new value. 
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RandomConst: MOVE R2,FOP0 

ROT Rl,-(logStrideN»logStrideL),R2 ;R1 has s, the distobj initial node number, in bits 0..10 

ASH R2,-16,R2 ; and e, 2's complement logStride, in bits 11.. IS. 

ASH R2,logStrideL-16,R2 ;R2:-e. 

MOVE -1,R0 

ASH R0,R2,R2 ;R2:-e zeros in LSBs with the MSBs being ones. 

MOVE IRandomSeed, A0],R0 

ADD R0,7,R0 .-Advance the random node counter and return its new value. 

MOVE RO, (RandomSeed,AO] 

AND R0,R2,R2 

BR "GetConst 

fltObjectNode - IP:absl fault |unchecked|ObjectNode«of f setN 



I Return the ID of the preferred constituent of a distributed object with the given I 
I DID. I 



I 

I Call: preferredConstituent 

I 

lln: Rl DID. 

I 

I Out: Rl ID. 

I 

ICriticality 5. 

I 

lAlters RO/RI . 



PreferredConst : 



PrefCnst Dense: 



MOVE 


R2.FOP0 


ROT 


Rl,-(logStrideN+lo 


ASH 


R2,-16,R2 


ASH 


R2,logStrideL-16,R 


LE 


R2,0,R0 


BT 


RO, "PrefCnst Dense 


MOVE 


-1,R0 


ASH 


R0,R2,R2 


MOVE 


[SerialNode,AO] ,R0 


AND 


R0,R2,R2 


DC 


initialNodeM 


AND 


R0,R1,R0 


OR 


R0,R2,R2 


DC 


ID:~homeNodeM 


AND 


R0,R1,R1 


AND 


R2,xM, RO 


OR 


R1,R0,R1 


ROT 


R2.-XL, RO 


AND 


R0,yM,R0 


ROT 


RO,yN,R0 


OR 


R1,R0,R1 


ROT 


R2,-(xL+yL),R2 


AND 


R2, ZM, R2 


ROT 


R2,ZN,R2 


OR 


R1,R2,R1 


MOVE 


FOP0,R2 


MOVE 


FIP.IP 


DC 


ID:-homeNodeM 


AND 


R0,R1,R) 


MOVE 


NNR, RO 


OR 


R1,R0,R1 


MOVE 


FOP0.R2 


MOVE 


FIP,IP 



;Criticality 6. 

eL),R2 ;R1 has s, the distobj initial node number, in bits 0..10 

;and e, 2's complement logStride, in bits 11.. 15. 

; R2 : -e . 

.•Jump to a faster routine if the distributed object is dense. 

;R2:~e zeros in LSBs with the MSBs being ones. 
;R2:-masked serial node number. 

R2:-serial constituent node number. 

Rl :«ID: serial number. 

Store the x node number in Rl. 



; Store the y node number in Rl. 



; Store the z node number in Rl . 



;R1 :-ID: serial number. 

;This is a dense distributed object; just use the current node. 



fltPreferredConstituent - IP:absl fault I unchecked I PreferredConst <<offsetN 



I Return the ID of the nth constituent of a distributed object with the given DID. 



RO 

Rl 



ICriticality 1. 

I 

I Alters R0-R2. 



MOVE 
MOVE 
CHECK 



ROT 

ASH 

ASH 

LT 

BF 

NEG 

MOVE 

ASH 

NOT 

AND 

ROT 

ADD 

NEG 

ASH 



R2, 
R3, 
RO, 
R2, 
Rl, 
R2, 
RO, 
R2, 
Rl, 
R2, 
R2, 
R2, 
R3, 
R2, 
-1, 
R3, 
R3, 
RO, 
R3, 
Rl, 
R2, 
RO, 



FOPO :Criticality 

FOP1 

INT.R2 

~Co_BadType 

DID.R2 

~Co_BadType 

0,R2 

"Co_BadRange 

-(logStrideN+logStrideL),R2 



Save R2 and R3 . 



-16, R2 

logStrideL-16,R2 

0,R3 

"co_Sparse 

R2 

R3 

R2.R3 

R3 

R3,R3 

serialN,R3 

R3.R1 

R2 

R2.R0 



.-and 
; R2 : -e 



has s, the distobj initial node number, 
i complement logStride, in bits 11.. 15. 



in bits 0. .10 



.•There is at most one constituent per node. 
.•There are multiple constituents per node. 



R3 contains the number of constituents per node minus one. 
Modulo n by the number of constituents per node to get the 
displacement to be added to the DID's serial number. 



.-Divide n by the number of constituents per node. 
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MOVE 


0,R2 


Co 


Sparse: 


FFB 


R0,R3 






SUB 


R3,R2,R3 






ASH 


R0,R2,R2 






HOVE 


30-LogNNodes,R0 






GT 


R3,R0,R0 






BF 


RO, ~co BadRange 






MOVE 


FOPl,R3 






BR 


"GetConst 


Co 


BadRange: 


HALT 


haltRange 


Co 


BadType : 


CHECK 


R0,FUT,R2 






BT 


R2,~Co Future 






CHECK 


R1,FUT,R2 






BT 


R2, A Co Future 






HALT 


haltType 


Co 


Future : 


HALT 


haltFuture 



;Now assume that there is one constituent per node. 
;R3:-30-lg(n) . 
;R3:-30-lg(node#) . 
;R2:-node# . 
;R0:-30-lg(NNodes) . 



;If either operand was a future, crash with the future fault; 
.-otherwise, crash with the type fault. 



lP:abs| fault I unchecked I Co«of fsetN 



Execute a RequestObject message. The ID passed in the message must be a word tagged 
ID, a class or a selector; it cannot be a future, constant, or distributed object. 



Requeatob ject : 


MOVE 


[reqObjID,A3],R3 




XLATE 


R3,localXLATE,Rl 




CHECK 


R1,ADDR,R2 




BT 


R2,*RO Local 


RO Re send: 


DC 


MSG:msgRequestObject+3 



RO_Locked : 
RO Local: 



RO_Copyable : 
RO Home: 



SEND20 R1.R0 

SENDO R3 

SENDEO [reqObjReplyNode,A3J 

SUSPEND 



;Criticality 4. 

;Is the object here? 

.Yes. 

;No. Send a message requesting the object to the object's 

.•likely location. 



MOVE 

BR 

MOVE 

MOVE 

MOVE 

ROT 

BT 

AND 

DC 

ADD 

SUB 

MOVE 

SUB 

MOVE 

SEND20 

SENDO 

MOVE 

CALL 

SENDEO 

ROT 

BT 

MOVE 

AND 

MOVE 

EQUAL 

BT 

CALL 

SUSPEND 

DC 

OR 

MOVE 

MOVE 

ENTER 

CALL 

BNIL 

MOVE 



NNR.R1 
~RO_Resend 
R3, ID2 
R1,A2 

lobjectHeader, A2 ] , R2 

R2,-hdrLockedN,R0 

RO, ~RO_Locked 

R2, hdrLengthM, R3 

MSG:msgMigrateObject+ 1 

R0,R3,R0 

R3, 1,R3 

(R3,A2],R3 

Rl, 1,R1 

R1,A2 

{reqObjReplyNode, A3 i , RO .-Send the message header 



.•Critical ity 4. Resend the message back to this node. 
;Criticality b. Point AID2 to the object. 



.-Resend the message back to this node if the object is locked. 

.-The length of the message is one plus the length of the 
.-object . 

.-save the last word of the object in R3. 
.-Shorten the object's limit by one word. 



R2 

0,R0 

blockSend 

R3 

R2, -hdrCopyableN, RO 

RO, ~RO_Copyable 

ID2,R1 

Rl, [NodeMask,A0],R2 

NNR.R3 

R2,R3,R0 

RO, A RO_Home 

deallocateObject 



.-Send the words of the object. 



Send the last word of the object. 

Leave the object here if it is copyable. 

If the object is not copyable, purge it from the xlate table 
and from the BRAT, unless this is its home node, in which 
case purge it from the xlate table and replace its BRAT entry 
to suggest that it is present on this node; messages requesting 
the object will keep cycling at this node until the object's 
new location is known. 



OBJ:hdrDeleted 

RO, lobjectHeader, A2],R0 .-Set the deleted flag in the object header. 

RO, [objectHeader,A2 j 

NIL.RO 

R1,R0 

lookupBinding 

RO, ~RO_NoBinding 

R3, [R2,A0] 



.-Pretend that the object is located at this node. 



RO_NoBinding 
msgRequestObject 



SUSPEND 

HALT haltBRATMissing 



unchecked IRequestObject<<of fsetN 



I Execute an AcceptObject message. Make this node the object's home. The object's ID I 
I must reflect this node as the object's home. I 



AcceptObject : 



MOVE 

MOVE 

CALL 

MOVE 

DC 

ADD 

MOVE 

MOVE 

CALL 

MOVE 

BNIL 



[2+objectHeader,A3] , RO ,-Crit icality 3. Read the object's header and ID. 
[2+objectID,A31,Rl 

.-Allocate space for the object. 



allocNewObject 

A2,R1 

(-(l«baseN)+l) "2 

R1,R0,R0 

R0.A2 

4,R0 

blockMove 

[1,A3],R3 

R3, *AO_Done 
DC MSG:msgAcknowledgeObject+2 
SEND20 R3,R0 
SENDEO |2+objectID,A3) 
SUSPEND 



.■Decrease A2 ' s base by two words and increase its limit likewise 
.-because the object starts in the message two words late. 

.-Copy the object into the heap starting from the fifth word 
,-of the message (third word of the object) . 

.-Acknowledge the sender if an acknowledgement was requested. 



AO_Done : 

msgAcceptObject - unchecked I AcceptObject<<of fsetN 
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; I Execute a Migrateobject message. If the object is copyable, store a copy of it on t 
; I this node. If the object is not copyable, store it on this node, lock it, and I 
;t inform the home node about the object's presence here. I 



Migrateobject : 



MO_Copyable : 
MO_Noncopyable : 



MO_Unexpected : 
MO_Expected: 



MO NextRestart 



MO_Suspend: 



MOVE 

MOVE 

ROT 

AND 

BT 

AND 

MOVE 

EQUAL 

BT 

MOVE 

DC 

SEND20 

SEND2E0 

OR 

BR 

OR 

ROT 

CALL 

ENTER 

MOVE 

CALL 

BNIL 

MOVE 

MOVE 

BR 

MOVE 

CALL 

MOVE 

MOVE 

SUB 

MOVE 

MOVE 

CALL 

CHECK 

BF 

MOVE 

MOVE 

MOVE 

MOVE 

XLATE 

MOVE 

CHECK 

BF 

DC 

SEND20 

SENDEO 

MOVE 

BR 



[I+objectHeader, A31.R0 ;Crlticality 3. Read the object's header and ID. 

[l+objectID,A3),Rl 

RO, -hdrCopyableN, R0 

R0,SFFFFFFF1,R0 

RO, "MOCopyable 

Rl, [NodeMask,A0),R2 

NNR,R3 

R2,R3,R2 

R2, *MO_Noncopyable 

R0,R2 

MSG:msgUpdateHome + 3 

R1,R0 

R1,R3 

R2, 1« (hdrLockedN-hdrCopyableN) , RO 

~MO_Noncopyable .-Lock this object. 

RO, 1« (hdrPurgeableN-hdrCopyableN) , RO 

RO, hdrCopyableN, RO .-Allocate storage for the object and put it into the xlate table. 

allocObject 

R1,R0 

R0,R3 

lookupBinding 

RO, *MO_Unexpected 

R3, [R2,A0] 

R0,R2 

*MO_Expected 

R3.R0 

enterBinding 

NIL,R2 

A2,R0 

RO, (l«baseN]-l,RO 

R0,A2 

3,R0 

blockMove 

R2, ID.RO 

RO, "MO_Suspend 

[FastContextQueue, A0 ] , RO 

RO, [contextNext,Al] 

ID1,R0 

RO, [FastContextQueue, A0 J 

R2,objectXLATE,Al .-Restart contexts. 

[contextNext,Al),Rl 

Rl, ID,R3 

R3, "Reply_Restart .-Restart this context if there is only one to be restarted. 

MSG:msgRestartContext42 .-Otherwise send a message back to this node to restart this 

R2,R0 .-context and go restart the next one now. 

R2 

R1,R2 

~MO NextRestart 



.-Clear the purgeable, locked, and marked flags. 

,-If the object is copyable, make this copy purgeable. 

.-This object is noncopyable. 

.-Check whether this node is the object's home node. 

;If so, do nothing. 

.-Save RO. 

.-Otherwise, tell the home node about this object's location 

;and lock the object until the home node replies. 



,-Criticality S. 

.-Check whether a binding for the object existed in the BRAT. 

;If one did exist, save its data in R2 and rebind the BRAT 
,- entry to point to the object. 

.-This obeys criticality 5 because allocObject allocates three 
.-extra heap words. 

.-Decrease A2 ' s base by one word and increase its limit 
.-because the object starts in the message one word late. 

.-Copy the object into the heap starting from the fourth word 
,-of the message (third word of the object). 

.-Leave if there are no contexts to restart. 

.-Dispose the current context. 



msgMigrateObject = unchecked lMigrateobject<<of f setN 



Execute a RestartContext message. 



I Execute a Reply message. 



RestartContext: MOVE 
MOVE 
MOVE 
MOVE 
MOVE 
XLATE 
BR 
MOVE 
MOVE 
MOVE 
MOVE 
MOVE 
XLATE 
MOVE 
MOVE 
MOVE 
XLATE 
MOVE 
XLATE 
MOVE 
XLATE 
MOVE 
MOVE 
MOVE 
MOVE 
MOVE 



Reply_2 : 



Reply_Restart : 



Reply : 



ReplyHalt: 



MOVE 

XLATE 

MOVE 

WTAG 

MOVE 

EQ 

MOVE 

BT 

EQ 



[FastContextQueue, A0 ] , RO 

RO, [contextNext.Al] 

ID1,R0 

RO, [FastContextQueue, A0 1 

[replyID,A3],R0 

RO,objectXLATE,Al 

*Reply_Restart 

Rl, [R0,A2] 

[FastContextQueue, A01 ,R0 

RO, (contextNext.Al) 

ID1.R0 

RO, [FastContextQueue, A0] 

R3,objectXLATE,Al 

FALSE, RO 

R0,Q 

[contextID0,Al),R0 

RO, restoreXLATE.AO 

[contextlD2,Al),R0 

RO, restoreXLATE, A2 

[contextID3,All,R0 

RO, restoreXLATE, A3 

[contextR3,Al!,R3 

[contextR2,Al],R2 

[contextRl,Al],Rl 

(contextRO,Alj ,R0 

[contextlP.Alj, IP 

[replyID,A3],R3 

R3,objectXLATE,A2 

[replySlot,A3],R0 

R0,CFOT,R0 

(contextNext,A2],Rl 

R1,R0,R2 

[replyValue,A3] ,R1 

R2, ~Reply_2 

RO, [R0,A21,R2 

R2, -ReplyHalt 

Rl, [R0.A2] 



.-Put the fast context back on the context queue. 



,-Criticality 3. 

,-Xlate the reply context into Al . 



.-Store the value replied. 

.-Put the fast context back on the context queue. 



MOVE 

SUSPEND 

HALT haltReply 



,-Xlate the reply context into Al . 
.-Turn off A3 queue wraparound. 



.-Restore the address and ID registers 



.-Restore the data registers. 



.-Resume computation of the message. 

,-Criticality 3. 

,-Xlate the reply context into A2 . 



.-Check whether the process was waiting for this slot. 



; Suspend if not. 

.-check the previous value from the slot and make 

.-sure it was the proper cfuture. 

;Store the new value there. 



msgRestartContext - unchecked [RestartContext<<of f sotN 
msgReply - unchecked I Reply<<of f setN 
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I Execute an UpdateHome message. Update the BRAT to contain the object's new home I 
I location and send an Unlock message to the object to allow it to move again. I 



UpdateHome : 


MOVE 


(updtHomeID,A3],Rl 




SENDO 


( updtHomeNode, A3 1 




DC 


MSG:msgUnlock+2 




SEND2E0 


R0,R1 




CALL 


lookupBinding 




BNIL 


R0,-UH Halt 




CHECK 


R0,INT,R3 




BF 


R3.-UH Waiting 




MOVE 


[ updtHomeNode, A3 ] , RO 




MOVE 


RO, (R2,A0] 




SUSPEND 




UH Waiting: 


CHECK 


R0,ID,R3 




BF 


R3,*UH Halt 




XLATE 


R0,objectXLATE,A2 




MOVE 


[contextNext, A2 ] , RO 




BNIL 


R0,"UH Halt 




CHECK 


RO, INT,R3 




BF 


R3,"UH Waiting 




MOVE 


[updtHomeNode, A3 ] , RO 




MOVE 


RO, [contextNext, A2] 




SUSPEND 




UH Halt: 


HALT 


haltBRATMissing 



;Criticality 2. 

;Send an Unlock message back to the object. 



.-Look the object up in the BRAT. 
;The BRAT entry should be present. 



;Change the BRAT entry to reflect the 
.•object's new location. 



;The BRAT entry should be present. 



.•Change the BRAT entry to reflect the 
/object's new location. 



msgUpdateHome - UpdateHome<<of f setN 



I Execute an Unlock message. Unlock the object to allow it to move again. 
I object was marked deleted, dispose it now. 



Unlock: 


MOVE 


[unlockID,A31,Rl 




XLATE 


Rl,objectXLATE,A2 




DC 


-hdrLocked 




MOVE 


[objectHeader, A2 ] , R2 




AND 


R2,R0,R2 




MOVE 


R2, [objectHeader, A2] 




ROT 


R2, -hdrDeletedN, R2 




BF 


R2,-Unlk Done 




MOVE 


R1,R0 




CALL 


disposeObject 


Unlk_Done: 


SUSPEND 




msgUnlock - unchecked |Unlock<<of fsetN 



,-Criticality 2. 

;Find the object and clear its locked flag. 



; I f the object was marked deleted, dispose it now. 
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.IKIIHIIiMllliMKII 
;## «< 

;♦» Method Manager II 
.»» II 

; llllllltf tilt tllll (lit 



I Return the ID of a method associated with the given class and selector. The second I 
I entry point, lookupMethodU, can be used when the class has already been type-checked I 
I and coerced to be an integer. I 



I Call: lookupMethod 
I Call: lookupMethodU 

|In: RO Class (Tagged TAGO : subCLASS if lookupMethod is used, INT if lookupMethodU is used). 

I Rl Selector (Tagged TAGO:subSEL) . 

I 

ID of method or NIL if none. 



I Out : R2 

I 

ICriticality 1. 

I 

I Alters R0-R3/AID2. 



LookupMethod : 


CHECK 


R0,TAG0,R2 




BF 


R2,"LM Halt 




WTAG 


R0,INT,R0 




ROT 


R0,-subtagN,R3 




AND 


RO,csClassM,R0 




EQUAL 


R3,subCLASS,R3 




BF 


R3.-LM Halt 


LookupMethodU : 


ROT 


R0,csClassN,R2 




WTAG 


Rl, INT,R3 




AND 


R3,csSelectorM, R3 




OR 


R2,R3,R3 




WTAG 


R3,CS,R3 




PROBE 


R3,R2 




BNIL 


R2,~LM SendMsg 




MOVE 


FIP, IP 



LM_Halt: 
LM_SendMsg: 



haltClassType 



MOVE 


FIP.R2 


MOVE 


R2,F 


MOVE 


R2, [contextlP.AI 1 


MOVE 


R3, [contextRO.Al 1 


MOVE 


R0.R3 


AND 


Rl, (NodeMask,A0],R2 


DC 


TAGO : subCLASS<<subtagN 


OR 


R0,R3,R3 


DC 


MSG:msgApplyFunction+i> 


SEND20 


R2,R0 


MOVE 


ID1.R2 


DC 


LLookupMethod 



SEND20 R0.R1 

SEND2E0 R3,R2 

MOVE NIL.R1 

MOVE Rl, lcontextNext,AU 

DC SavestateID023-(-+2) 

BR RO 



.•Criticality 6. 

.-Make sure that RO is tagged as a class. 



.■Coerce it to an integer. 

.•Criticality 6. shift the class to the high 16 bits. 

.■Make a class/selector pair in R3. 

.-Get the cached ID of the method into RO, if there is one. 



.-Criticality 6. 

.•Criticality 3. 

.-Save FIP in the context. 

.•Save the class/selector pair in RO of the saved context. 



.-Generate the class ID in R3 . 

.•Send a LookupMethod message asking to lookup the method and 
.•send a reply back to the context. 



.-The LookupMethod handler will return the method via a 
.MethodReply, which will reply into the R2 slot of the context. 
.-There is no need to save the data registers in the context. 



fltLookupMethod - IP :abs I fault j unchecked I LookupMethod<<of f setN 
fltLookupMethodU - IP:abs I fault I unchecked I LookupMethodU<<of f setN 



;| Execute a MethodReply message. 



MethodReply: 



MOVE 

XLATE 

MOVE 

MOVE 

ENTER 

MOVE 

MOVE 

MOVE 

XLATE 

MOVE 

XLATE 

MOVE 

XLATE 

MOVE 



[methodReplylD, A31.R0 .-Criticality 3. 

RO,objectXLATE,Al ;Xlate the reply context into Al 
[methodReplyValue,A3],R2 .-Get the method ID. 
[contextRO.Al], RO ~ " 
R0.R2 



FALSE, RO 
R0,Q 

|contextID0,Al),R0 
RO, restoreXLATE, AO 

tcontextID2,Al],R0 
RO, restoreXLATE, A2 

[contextID3,Al],R0 
RO, restoreXLATE, A3 

[contextlP.AI], IP 



; Enter it into the cache. 
.-Turn off A3 queue wraparound. 
.-Restore the address and ID registers. 



.-Resume computation. 



msgMethodReply - MethodReply<<of f setN 
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HIIIIIIIIHIIIH 
II SI 

ft Utilities II 
## II 

If IIIOIIIIMIill 



I Divide Rl by RO. Return the quotient and remainder. The magnitude of the remainder I 
I is always less than the magnitude of the divisor, and the sign of the remainder is I 
I the same as the sign of the divisor. Halt if the divisor is zero. I 



I call: divide 



RO 
Rl 



Divisor . 
Dividend. 



I Out : RO 

I Rl 

I 

ICriticality 1. 

I 

I Alters R0/R1. 



Quotient . 
Remainder 



Divi 


Lde: 


MOVE 


R2, [TempDiv R2.A0) 






MOVE 


R3, [TempDiv R3.A0] 






CHECK 


R0,INT,R2 






BF 


R2,~Div Nonlnteger 






CHECK 


R1,INT,R2 






BF 


R2,*Div Nonlnteger 






BZ 


RO, ~Div_Zero 






BNZ 


Rl,"Div DividendNZ 






MOVE 


0,R0 






BR 


A Div Done 


Div 


DividendNZ: 


LT 


R0,0,R2 






EQUAL 


Rl, $80000000, R3 






MOVE 


R3, [TempDiv 80000000, A0] 






BF 


R3,*Div Normal 






EQUAL 


R0.-1.R3 






BT 


R3,~Div Overflow 






ADD 


R1,R0,R1 






BF 


R2,~Div Normal 






SUB 


R1,R0,R1 






SUB 


R1,R0,R1 


Div 


Normal: 


LT 


R1,0,R3 






BF 


R2,~Div DivisorPos 






NEQUAL 


R0, $80000000, R2 






BT 


R2,~Div DivisorNeg 






MOVE 


0,R0 






BT 


R3,~Div Donel 






MOVE 


-1,R0 






ADD 


Rl, $80000000, Rl 






BR 


~Div Donel 


Div 


_DivisorNeg: 


NEG 


R0,R0 






NEG 


R1.R1 






NOT 


R3,R3 


Div 


DivisorPos: 


BF 


R3,"Div DividendPos 






NEG 


R1,R1 


Div 


DividendPos 


: MOVE 


R2,FOP0 






MOVE 


R3.FOP1 






GT 


R0,R1,R2 






MOVE 


R0.R3 






MOVE 


0,R0 






BT 


R2,*Div Done3 






FFB 


R1,R2 






MOVE 


R2, [TempDiv Count, A0] 






FFB 


R3,R2 






SUB 


R2, [TempDiv Count, A0],R2 






LSH 


R3,R2,R3 






MOVE 


R2, [TempDiv_Count,A01 






BR 


~Div Loop 1 



Criticality 6. Save R2 and R3 . 
Check for futures and bad types. 

Halt if the divisor is zero. 

If the dividend was zero, return a zero quotient and 
remainder . 

■R2 is true if the divisor is negative. 
TempDiv_8C00Q000 is true if the dividend was S800Q0OOO. 

■In this case, when the divisor is -1, the division overflows 

■The dividend is $80000000. 

:When the divisor is positive, add it to the dividend; 

■when the divisor is negative, subtract it from the dividend. 

■The reverse adjustment will be made on the quotient later. 

■R3 is true if the dividend is negative. 

:The divisor is -580000000. 

:When the dividend is positive, the quotient is -1. 

:When the dividend is negative, the quotient is 0. 



If the divisor was negative, negate both it and the dividend. 



If the dividend is now negative, negate only it. 

Now both the divisor and the dividend should be positive 

{and no greater than $7FFFFFFF) . 

■Move the divisor to R3 . 

■From now on R0 contains the quotient. 



:R2 contains the number of extra bits of magnitude in the 
.-dividend over the divisor. 

:Shift the divisor so that its most significant bit is in the 
rsame position as the dividend's. 



Div_Loop: 

Div_Loop_l : 

Div_Loop_2 : 
Div_Done3: 

Div_Done2 : 
Div Donel: 

Div Done: 



Div_Zero: 
Div Overflow: 



SUB R2,1,R2 

MOVE R2, (TempDiv_Count, A01 

ADD R0,R0,R0 

LSH R3,-1,R3 

LT R1,R3,R2 

BT R2, "Div_Loop_2 

SUB R1,R3,R1 

ADD R0,1,R0 

HOVE (TempDiv_Count , A0 ] , R2 

BNZ R2, ~Div_Loop 

MOVE FOPl,R2 

BF R2, "Div_Done2 

NEG R0,R0 

BZ Rl, *Div_Done2 

SUB R0,1,R0 

SUB R3,R1,R1 

MOVE FOP0,R2 

BF R2, "DivDonel 

NEG R1,R1 

MOVE [TempDiv_80000000, A0] , R3 

BF R3, *Div_Done 

MOVE FOP0,R2 

SUB R0,1,R0 

BF R2,*Div_Done 

ADD R0,2,RO 

MOVE [ TempDi v_R2 , A0 ] , R2 

MOVE [TempDiv_R3,A0] , R3 

MOVE F I P , I P 

HALT haltDivO 

HALT haltOverflow 



Div_NonInteger: CHECK R0,FUT,R2 



.-Shift the quotient to the left and the divisor to the right. 
;Try subtracting the shifted divisor from the dividend. 

; If successful, increment the quotient . 



If the dividend was negative, negate the quotient; 

if the remainder was positive, subtract the remainder 

from the divisor and subtract one from the quotient to keep the 

remainder positive. 



If the divisor was negative, negate the remainder. 

The dividend was $80000000. Perform the quotient adjustment. 
When the divisor was positive, subtract 1 from the quotient; 
When the divisor was negative, add 1 to the quotient. 

Restore registers and return. 



.-If either operand was a Euture, crash with the future fault; 
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Div Future: 



HALT 
HALT 



R2, ~Div_Future 

R1,FUT,R2 

R2, "Div_Future 

ha It Type 

haltFuture 



; otherwise, crash with the type fault. 



fltDivide - IP:abs I fault lunchecked I Divide<<of fsetN 
fltcrashoverflow - IP :abs I fault I unchecked I Div Overflow«of fsetN 



I Allocate and initialize a new closure. 



I 

ICall: newClosure 



First word of object. 



AID2 -Object. 

Rl Object's ID. 



icriticality 3. 

I 

I Alters R0-R3/AID2. 



NewClosure: 



MOVE FIP,R1 ;Criticality 6. 

MOVE R1,F .-Criticality 3. 

MOVE Rl, [TempNCl FIP.AOJ 

CALL allocNextOb3ect 

MOVE R0.R1 

DC (»1 10001 00000100000 I (callclosurei%100000)<<4 I callclosure* %01 111 1)<<17 

WTAG R0,INST3,R0 .-Install the faulter instruction. 

MOVE R0, [oClosureCode,A2] 

MOVE [TempNCl_FIP,AO],IP 

fltNewClosure - IP :abs I fault lunchecked I NewClosure<<of fsetN 



I Call the function in a closure. This routine does not return. 



ICall: callclosure 

I 

icriticality 0. 



Callclosure: 



CC1 SendRest: 



Callclosure 2 : 



MOVE 


A0,R3 


MOVE 


R3,A2 


DC 


MSG:msgApplyFunction 


MOVE 


[oFunctionNArqs,A2),Rl 


OR 


R0,R1,R0 


MOVE 


A3.R2 


OR 


R2,lengthM, R2 


XOR 


R2,lengthM, R2 


OR 


R2,R1,R2 


MOVE 


R2,A3 


SUB 


R3, 1,R2 


AND 


R3,lengthM, R3 


SUB 


R3,oClosureDispiay,R3 


ADD 


R0,R3,R0 


MOVE 


NNR.R1 


SEND20 


R1.R0 


SEND0 


(oClosureFunct, A2 I 


DC 


IP:abs 1 unchecked ICallClo 


MOVE 


R0, [LimitOverride,A0J 


SEND0 


12, A3] 


SEND0 


[3, A3] 


SEND0 


14, A3] 


SEND0 


[5, A3] 


SEND0 


[6, A3] 


SEND0 


IT, A3] 


SEND0 


[8, A3] 


SEND0 


[9,A3J 


SEND0 


[10, A3] 


SENDO 


(11, A3] 


SENDO 


[12, A3] 


SEND0 


[13, A3] 


SENDO 


114, A3] 


SEND0 


[15, A3] 


MOVE 


16, R0 


SENDO 


[R0,A3] 


ADD 


R0, 1,R0 


SENDO 


[R0.A3] 


ADD 


RO, 1,R0 


SENDO 


[R0,A3] 


ADD 


R0, 1,R0 


SENDO 


[R0,A3] 


ADD 


R0, 1,R0 


BR 


-CC1 SendRest 


AND 


R2,lengthM, R3 


MOVE 


[R3,A2),R3 


MOVE 


R2.A2 


MOVE 


2,R0 


CALL 


blockSend 


SENDE0 


R3 


CALL 


suspend 



Criticality 5. 
Copy A0 to A2. 



Mask the length of the object pointed by A3 
to nArgs words. 



Put the number of display arguments in B3 . 
Update the length of the message. 

Send the message back to this node. 

Send the real function. 
jure_2<<of fsetN 
■Override the Limit fault. 

Send the rest of the arguments. 



Sends more arguments. 



Get the last display argument. 
Decrement the length of the display by one. 
:Send the display arguments. 



fltCallClosure - IP :abs I unchecked I CallClosure«of fsetN 
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t«*llf<llll«lltll<<lltl 
II tt 

## Control Manager t# 
It <» 

(IIIIIIMtillllllllllll 



I Execute an Apply, ApplyFunction, or Applyselector message 

+ 

Apply: 



MOVE 

CHECK 

BT 

CHECK 

BT 

HALT 

ApplyFunction: MOVE 
XLATE 



.•Criticality 0. Get the funct. 

;If it has tag 0, assume it is a selector. 

;If it has tag ID, assume it is a function. 

.•Otherwise the message was invalid. 

/Criticality 0. Get the function. 



[ applyFunct , A3 ] , Rl 

R1,TAG0,R2 

R2, -Applyselector 

Rl, ID.R2 

R2, -ApplyFunction 

haltApply 

[applyFunct, A3 ),R0 

R0,objectXLATE,A0 

IP:oFunctionCode<<of f setN .-Start executing at the second word of the function. 

R0,IP 



Applyselector : 


MOVE 


[ applyReceiver, A3 ] , R0 




PROBE 


R0,R1 




BNIL 


R1,"AS Miss 




MOVE 


R1.A2 




MOVE 


R0.ID2 




MOVE 


lobjectHeader, A2 ] , R0 




WTAG 


R0, INT,R0 




ROT 


R0,-hdrClassN,R0 




AND 


R0,hdrClassM,R0 


AS1: 


MOVE 


[applyFunct, A31,R1 




CALL 


lookupMethodu 




DC 


IP :oFunctionCode<<of f 




XLATE 


R2, ob jectXLATE, A0 




MOVE 


R0, IP 


ASMiss: 


CALL 


typeOf 




BR 


"AS 1 



; Criticality 0. Get the receiver. 
,-Probe it, hoping it is an ID or DID. 

;If so, point A2 to the instance object. 

.-Extract the class from the object header. 



;Get the selector. 

;R0 now contains INT:class. 



.•Call the real class-extraction routine. 



msgApply - Apply<<of f setN 

msgApplyFunction - ApplyFunctlon«of fsetN 
msgApplySelector - ApplySelector<<of fsetN 
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IMIIIMIIIMIIIflllll 
It #» 

It Initialization tt 
tt #1 

Mf«i«<i««<t44t«tiii(l 



I Initialize the MDP . 



InitializeMDP: 



DC 
MOVE 
MOVE 
MOVE 
MOVE 
MOVE 
MOVE 
MOVE 
MOVE 
MOVE 
MOVE 
MOVE 
MOVE 
MOVE 
MOVE 
MOVE 
MOVE 
MOVE 
MOVE 
MOVE 
MOVE 
MOVE 
MOVE 
MOVE 
MOVE 
MOVE 
MOVE 
WTAG 
DC 
IMDP_ClrGlobals: SUB 
MOVE 
BNZ 

DC 

MOVE 

DC 

MOVE 

DC 

MOVE 

DC 

MOVE 

MOVE 



IMDP ClrXlate: 



IMDP ClrBrat: 



.-Clear the user address and ID registers. 



IMDPClrHeap: 



IMDP MakeFast: 



ADDR: invalid 

RO.AO 

R0,A1 

R0.A2 

R0,A3 

R0,A0B 

R0.A1B 

R0,A2B 

R0,A3B 

RO.AOB' 

R0.A1B' 

R0,A2B' 

R0.A3B - 

NIL, RO 

RO, IDO 

R0.ID1 

R0.ID2 

R0,ID3 

RO, IDOB 

R0,ID1B 

R0.ID2B 

R0,ID3B 

RO.IDOB' 

R0.ID1B' 

R0,ID2B' 

RO, ID3B' 

-1,R1 

R1,CFUT,R1 

64 

RO, 1,R0 

Rl, [R0,A0] 

R0, A IMDP CIrGlobals 



ADDR:QueuelStart<<baseN ; Initialize the queues. 

RO.QHL' 

ADDR:QueuelStart<<baseNI (QueuelEnd-QueuelStart-1 ) 

RO,QBM' 

ADDR:QueueOStart<<baseN 

RO.QHL 

ADDR:QueueOStart<<baseN[ (QueueOEnd-QueueOStart-1 ) 

R0,QBM 

NIL,R2 : R2 contains NIL. 

ADDR:Xlatestart<<baseN ) (XlateEnd-xlateStart-U 



.■Clear all globals to CFUT:-1. 
;R1 contains CFUT:-1. 



DC 

MOVE R2, (LimitOverride, AO) 

MOVE RO.TBM 

DC ADDR:XlateStart<<baseN 

MOVE R2, [FastContextQueue,AO 

MOVE R0,A2 

DC XlateEnd-xlateStart 

SUB R0,1,R0 

MOVE R2, IR0.A2) 

BNZ RO,*IMDP_ClrXlate 

MOVE R2, |BRATFree,AO] 

DC ADDR:BRATStart<<baseN 

MOVE R0.A2 

DC BRATEnd-BRATStart 

SUB RO, 1,R0 

MOVE R2, IR0.A2] 

BNZ RO, "IMDP_ClrBrat 

DC FixedHeapStart 

MOVE RO, [HeapStart.AO] 

MOVE RO, [FirstFree.AOJ 

MOVE R0.R3 

DC HeapEnd 

MOVE RO, [LastFree.AO] 

IF 1FASTSIM 



MOVE 

ADD 

GE 

BF 
END 
MOVE 
MOVE 
DC 

MOVE 
AND 
ROT 
AND 
ROT 
OR 
ROT 
AND 
ROT 
OR 
MOVE 
DC 
OR 
MOVE 
MOVE 
MOVE 
MOVE 
SUB 
MOVE 
ROT 
MOVE 
OR 



Rl, [R3.A0] 

R3,1,R3 

R3,R0,R2 

R2, *IMDP_ClrHeap 



Initialize LimitOverride. 
.■Initialize the xlate table. 

; Initialize FastContextQueue . 
.•Clear every entry in the table to NIL. 
; Initialize BRATFree. 
.•Clear the BRAT. 

; Initialize the heap. 



.■Clear the heap to CFUT:-1. 



Initialize RandomSeed, and SerialNode. 

Calculate this node's serial number from the NNR value. 



NNR,R2 

R2, (RandomSeed, A01 

nodeMask 

RO, [NodeMask, AO] 

R2,xM, R3 

R2,-yN,R0 

R0,yM,R0 

RO.xL.RO 

R3,R0,R3 

R2,-ZN,R0 

RO, ZM.RO 

R0,xL+yL, RO 

R3,R0,R3 

R3, (SerialNode, AO] 

ID: (nFastContexts-l)«serialN 

R0,R2,R0 .-Initialize LastOb ject ID and NextDistobjID. 

RO, [LastObjectIU,A0] 

0,R0 

RO, (NextDistobjID, AO] 

nFastContexts,R3 .-Make nFastContexts fast contexts. 

R3, 1,R3 

R3, (TempINITM_Context,AO] .-Save the number of fast contents yet to be made 

R3,serialN,Rl 

NNR, R3 .-Put the node number into the context ID. 

R1,R3,R1 
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WTAG 

DC 

CALL 

XOR 

ENTER 

CALL 

MOVE 

MOVE 

MOVE 

MOVE 

MOVE 

BNZ 

MOVE 

MOVE 

MOVE 

MOVE 

MOVE 

MOVE 

MOVE 

MOVE 

MOVE 



IMDP_Background : 



R1,ID,R1 

OBJ :hdr Locked I hdrFast l context size 

allocObject 

R0,rel,R0 .-Make the fast context ADDRs nonrelocatable. 

R1,R0 

enterBinding 

[ FastContextQueue, AO ] , RO 

RO, [contextNext,A2] 

ID2,R0 

RO, [FastContextQueue, AO] 

[TempINITM Context, AO J, R3 

R3, -IMDP_MakeFast 

R0,ID1B .-Initialize priority O's A1D1 to a fast context. 

A2,R1 

R1,A1B 

[contextNext,A2],R0 

RO, (FastContextQueue, AO] 

[FirstFree,AO],R0 

RO, IHeapStart.AO] 



FALSE, RO 
RO, 



IF 1REALMODE 

STOP 
END 
BR "IMDP_Background 



.-The real heap starts after the fast contexts. 
.■Enable message reception. 



;Do nothing in the background mode. 



OSEnd : 




FixedHeapStart : 




ORG 


FaultsOStart 


.•Priority faults: 




DC 


fltCrashO 


DC 


fltCrashO 


DC 


fltCrashO 


DC 


fltSend 


DC 


fltCrashO 


DC 


fltCrashO 


DC 


fltlNVADR 


DC 


f ItcrashType 


DC 


fltLimit 


DC 


fltEarly 


DC 


fltCrashO 


DC 


fltXLATE 


DC 


f ItCrashOverf low 


DC 


fltCFUT 


DC 


f ItcrashFuture 


DC 


f ItcrashType 


DC 


f ItcrashType 


DC 


fltCrashType 


DC 


f ItcrashType 


DC 


fltCrashType 


DC 


fltCrashO 


DC 


fltCrashO 


DC 


fltCrashO 


DC 


fltCrashO 


DC 


fltCrashO 


DC 


fltCrashO 


DC 


fltCrashO 


DC 


fltCrashO 


DC 


fltCrashO 


DC 


fltCrashO 


DC 


fltCrashO 


DC 


fltCrashO 


.•Priority 1 faults: 




DC 


fltcrashl 


DC 


fltCrashl 


DC 


fltcrashl 


DC 


fltCrashl 


DC 


fltCrashl 


DC 


fltCrashl 


DC 


fltCrashl 


DC 


fltCrashl 


DC 


fltCrashl 


DC 


fltCrashl 


DC 


fltCrashl 


DC 


fltCrashl 


DC 


fltCrashl 


DC 


fltCrashl 


DC 


fltCrashl 


DC 


fltCrashl 


DC 


fltCrashl 


DC 


fltCrashl 


DC 


fltCrashl 


DC 


fltCrashl 


DC 


fltCrashl 


DC 


fltCrashl 


DC 


fltCrashl 


DC 


fltCrashl 


DC 


fltCrashl 


DC 


fltCrashl 


DC 


fltCrashl 


DC 


fltCrashl 


DC 


fltCrashl 


DC 


fltCrashl 


DC 


fltCrashl 


DC 


fltcrashl 


.-System calls: 




DC 


f ltsuspend 


DC 


fltBlockMove 


DC 


fltBlockSend 


DC 


f ItCompactHeap 


DC 


fltAllocObject 


DC 


fltEnterBinding 


DC 


f ltLookupBinding 


DC 


fltDeleteBinding 


DC 


f ltPurgeBinding 



ID 
DID 



CATASTROPHE 

INTERRUPT 

QUEUE 

SEND 

ILGINST 

DRAMERR 

INVADR 

ADRTYPE 

LIMIT 

EARLY 

MSG 

XLATE 

OVERFLOW 

CFUT 

FUT 

TAG 8 

TAG9 

TAGA 

TAGB - FLOAT 

TYPE 

S14 

$15 

$16 

$n 

$18 
$19 
$1A 

$1B 
$1C 
SID 
$1E 
$1F 

CATASTROPHE 

INTERRUPT 

QUEUE 

SEND 

ILGINST 

DRAMERR 

INVADR 

ADRTYPE 

LIMIT 

EARLY 

MSG 

XLATE 

OVERFLOW 

CFUT 

FUT 

TAG 8 

TAG9 

TAGA 

TAGB 

TYPE 

$14 

SIS 

$16 

$17 

$18 

$19 

$1A 

$1B 

$1C 

$1D 

$1E 

$1F 

;$00 
;$01 
••$02 
;$03 
:$04 
:$05 
;$06 
;$07 
;$08 



ID 
DID 
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DC 
DC 
DC 
DC 


f ltNewLocalObject 
fltAUocNextObject 
fltAllocNewObject 
fltNewContext 


$09 
SOA 
$0B 
$0C 


DC 

DC 
DC 


fltDisposeContext 
f ltDisposeObject 
fltDeallocateObject 


$0D 
$0E 
$0F 


DC 
DC 


fltNewObject 
fltClassOf 


$10 
$11 


DC 
DC 
DC 


fltTypeOf 

fltObjectNode 

f ltPreferredConstituent 


$12 
$13 
$14 


DC 


fltco 


$15 


DC 

DC 
DC 


fltLookupMethod 

fltLookupMethodU 

fltDivide 


$16 

$17 
$18 


DC 


fltNewClosure 


$19 


DC 


fltCallclosure 


$1A 


DC 


fltCrashCall 


$1B 


DC 


fltCrashCall 


SIC 


DC 


fltCrashCall 


$1D 


DC 


fltCrashCall 


$1E 


DC 


fltCrashCall 


$1F 


DC 


fltCrashCall 


$20 


DC 


fltCrashCall 


$21 


DC 


fltCrashCall 


$22 


DC 


fltCrashCall 


$23 


DC 


fltCrashCall 


$24 


DC 


fltCrashCall 


$25 


DC 


fltCrashCall 


$26 


DC 


fltCrashCall 


$27 


DC 


fltCrashCall 


$28 


DC 


fltCrashCall 


$29 


DC 


fltCrashCall 


$2A 


DC 


fltCrashCall 


$2B 


DC 


fltCrashCall 


$2C 


DC 


fltCrashCall 


$2D 


DC 


fltCrashCall 


$2E 


DC 


fltCrashCall 


$2F 


DC 


fltCrashCall 


$30 


DC 


fltCrashCall 


$31 


DC 


fltCrashCall 


$32 


DC 


fltCrashCall 


$33 


DC 


fltCrashCall 


$34 


DC 


fltCrashCall 


$35 


DC 


fltCrashCall 


•$36 


DC 


fltCrashCall 


$37 


DC 


fltCrashCall 


$38 


DC 


fltCrashCall 


$39 


DC 


fltCrashCall 


$3A 


DC 


fltCrashCall 


•$3B 


DC 


fltCrashCall 


•$3C 


DC 


fltCrashCall 


•$3D 


DC 


fltCrashCall 


•$3E 


DC 


fltCrashCall 


■$3F 



END 

BREAK HAZARDS 
IF ! REALMODE 

BREAK FAULT FaultsOStart, Fault slstart 

BREAK READ WRITE OSStart . .OSEnd-1 

BREAK FETCH 0.. OSStart 

IGNORE FETCH $400, $401 

BREAK READ WRITE FaultsOStart .. callsEnd-1 

STEP 300 

BREAK FETCH 5400, $401 

BREAK READ WRITE 0..3 
END 

INCLUDE "Runtime. m" 

IF (REALMODE 

RUN 
END 



.•Break on hazards. 

.•Break on catastrophic faults. 

; Protect operating system code. 

; Global s cannot be executed. 

;The initialization code, however, can. 

; Fault vectors are protected. 

.•Allow the operating system to write globals. 

.-The initialization code is now gone. 

; Locations through 3 are not used for anything. 



.-Load the run-time system. 
.•Initialize the operating system. 
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Runtime.m 



MDP Operating System 
version 2.3 



written by 
Waldemar Horwat 



Master's thesis under Prof. William Dally 



March 28, 1989 
May 1991 



Send problems and comments to 
waldemarghx.lcs . mit . edu. 



Copyright 1989, 1990, 1991 Waldemar Horwat 



;The download header is appended to the beginning of every module that is downloaded. 

MODULE DOWNLOADHEADER 

DC MSG:msgAcceptObject 1+2 

DC IONODE 

END 



vCurrentclass 
BeginO: 



Missl: 
Miss2: 



MissAll: 
FoundMethod : 
FoundMethod2 : 



MODULE 
6 

DC 

DC 

DC 

MOVE 

XLATE 

MOVE 

MOVE 

MOVE 

MOVE 

MOVE 

MOVE 

MOVE 

MOVE 

BZ 

EQ 

BT 

ADD 

SUB 

BNZ 

XLATE 

MOVE 

ADD 

GE 

BT 

MOVE 

ADD 

MOVE 

MOVE 

MOVE 

EQ 

BT 

ADD 

SUB 

BNZ 

MOVE 

BR 

MOVE 

BR 

ADD 

MOVE 

MOVE 

DC 

SEND20 

SEND2E0 

SUSPEND 



LookupMethod 

.•Class number of superclass currently scanned. 

OBJ:hdrCopyablelclassFunction<<hdrClassN I EndO -BeginO 

[LookupMethod} 

5 

[lookMethSelector, A3 ] , R0 

R0,objectXLATE,A2 .-Point AID2 to the selector object. 

[lookMethClass,A3] , R3 .-store the class in R3 . 

[lookMethReplylD, A3),R0 .-Save the reply ID in the context. 

R0, [lookMethReplylD, Al] 



FALSE, R0 

R0,Q 

R0, {vCurrentclass, Al 

[oSelNMethods,A2] , R2 

oSelMethods.Rl 

R2, -Missl 

R3, [Rl, A21.R0 

RO, -FoundMethod 

R1,2,R1 

R2,1,R2 

R2, -Searchl 

R3,objectXLATE,A3 

0,R0 

R0,1,R0 

RO, [0ClassNAllSupers,A3],Rl 



Turn off A3 queue wraparound. 



Search the class/method associations for the 
class in R3 . 



,-If no association was found, scan the class's 
; superclasses . 



Rl, -MissAll 

RO, [vCurrentclass, Al ] 

RO,oClassAllSupers,RO 

[R0,A3|,R3 

[oSelNMethods, A2 ] , R2 

oSelMethods.Rl 

R3, [Rl, A2J.R0 

RO, -FoundMethod 

R1,2,R1 

R2, 1,R2 

R2,-Search2 

[vCurrentclass, Al) ,R0 

-Miss2 

NIL,R2 

-FoundMethod2 
R1,1,R1 
[R1,A2],R2 

[lookMethReplylD, Al ] , Rl 
MSG:msgMethodReply+3 
R1.R0 

R1.R2 



.-Return NIL if an association still wasn't found. 



.-Search the class/method associations for the 
.-class in R3. 



,-No method was found, so return NIL. 
,- Extract the method ID. 

.-Return a reply message with the method ID. 



END 



,-NewDistobj message: 

LABEL newDistobjClass - 2 

LABEL newDistobjSize — 3 

LABEL newDistobjReturnID - 4 

LABEL newDistobjReturnSlot - 5 

LABEL newDistobjTemp - 6 .-Temporary 



Begin: 



MODULE 

DC 

DC 

DC 

MOVE 

EQUAL 

BT 

DC 

SEND20 

DC 

SENDO 



f_New_Distobj 

OBJ:hdrCopyable IclassFunct ion<<hdrClassN I End-Begin 

(fNewDistobjl 

e 

NNR,R1 

R1,0,R1 

Rl,-OnNode0 

MSG:msgApplyFunct ion I 6 

0,R0 ;If not, forward this message to node 

(f_New_Distobj) 

RO 
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Cosmos Listing 



Runtime.m 



SENDO |newDistobjClass,A3] 

SENDO (newDistobjSize, A3) 

SENDO [newDistobjReturnID,A3] 

SENDEO [newDistobjReturnSlot,A3| 
SUSPEND 

OnNodeO: MOVE [newDistobjSize, A3 ], RO .-Put max (size, I ) into RO. 

GT R0,0,R1 

BT Rl, A PositiveSize 

MOVE 1,R0 

Positivesize: SUB RO, 1,R0 

FFB R0.R1 

MOVE 31, RO 

SUB R0,RX,R0 

ADD RO,-LogNNodes,Rl 

NEG R1,R3 

MOVE -1,R2 

ASH R2,R3,R2 

NOT R2.R2 

DC ADDR:64 

MOVE R0.A2 

MOVE |RandomSeed,A2),R3 

ADD R3,3,R3 

MOVE R3, [RandomSeed,A2) 

AND R2,R3,R2 

MOVE [NeXtDistobjID,A2J,R3 

MOVE -1,R0 

ASH R0,R1,R0 

NOT R0,R0 

ADD R0,1,R0 

ADD R3,R0,R3 

MOVE R3, [NextDistobjID,A2] 

SUB R3,R0,R3 

ROT R3,serialN,R3 

R3,distobjMember,R3 

R3,R2,R3 

R1.R2 

R2, logStrideN,R2 

R2, (K<logStrideN + logStrideL)-l,R2 

R3,R2,R3 

R3,DID,R3 

LogNNodes,R2 

R2,R1,R2 

R2, [ newDistobjTemp, Al] 

0,R0 

R3.R1 



OR 

OR 

NEG 

ASH 

AND 

OR 

WTAG 

MOVE 

ADD 

MOVE 

MOVE 

MOVE 

CALL 

DC 

SEND20 

DC 

SENDO 

SENDO 

SEND20 

SENDO 

SENDO 

SENDO 

SENDEO 

SUSPEND 



.-Calculate lg (max (size, 1) ) and store it in RO. 



;R1 contains -stride. 
;R3 contains stride. 



Point A2 to the global area. 
Criticality 2. 



;R2 contains the offset. 

;Get the ID for this distributed object. 

.-Advance the ID counter by the number of constituents 

;per node. 



.■Criticality 1. 

.-Calculate the DID for this distributed object and store it 
;in R3. 



Put lg (max (size, 1) ) into newDistobjTemp. 



Send a newDistobjTree message to the node that will contain the 
first constituent of the distributed object. 



MSG:msgApplyFunct ion t 9 
R1.R0 

IfNewDistobjTroel 
RO 

(newDistobjClass,A3l 
(newDistobjSize, A3) , R3 


[newDistobjTemp, Al ] 
InewDistobjReturnID,A3] 
newDistobjReturnSlot,A3] 



END 
REF REV f_New_Distobj 



ID: (1<<30I (Umx)«sXI (limr>«sr| UsmZ)<<sZI (14mS)<<sS) 



.-NewDistobjTree message: 
LABEL newDistobjTreeClass - 2 
LABEL newDistobjTreeSize - 3 
LABEL newDistobjTreelD - 4 
LABEL newDistobjTreeStart - 5 
LABEL newDistobjTreeLogDelta - 
LABEL newDistobjTreeReturnID - 
LABEL newDistobjTreeReturnSlot 



Begin2 : 



MODULE 

DC 

DC 

DC 

MOVE 

BZ 

SUB 

MOVE 

DC 

SEND20 

DC 

SENDO 

SENDO 

SENDO 

SENDO 

SEND20 

MOVE 



fNewDistobjTree 

OBJ : hdrCopyable I classFunct ion<<hdrClassN I End2-Begin2 

(fNewDistobjTree) 

9 

(newDistobjTreeLogDelta, A3] , R3 

R3, "Leaf 

R3, 1,R3 

NNR,R1 

MSG:msgApply Function I 9 

R1,R0 .-Call fNewDistobjTree twice, each time on half of the range 

(fNewDistobjTree) 

RO 

[newDistobjTreeClass, A3] 

[newDistobjTreeSize, A3] 

[newDistobjTreelD, A3] 

[newDistobjTreeStart, A3] , 

9,R0 



of constituents. 



SEND2E0 [contextID,AI],RO 



WTAG R0,CFUT,R0 

MOVE RO, (9,A1) 

MOVE [newDistobjTreelD, A3), Rl 

MOVE (newDistobjTreeStart, A3! , 

MOVE 1,R2 

ASH R2,R3,R2 

ADD R0,R2,R0 

MOVE RO, [11, Al) 

CALL CO 

DC MSG:msgApplyFunction I 9 

SEND20 R1,R0 

DC (fNewDistobjTree) 

SENDO RO 

SENDO [newDistobjTreeClass, A3] 

SENDO [newDistobjTreeSize, A3) 

SENDO (newDistobjTreelD, A3] 

SEND20 [11,A1],R3 

MOVE 10, RO 

SEND2E0 [contextID,Al),RO 



Make a cfuture in RO . 
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! 131, 132, 146, 168 

#' 132, 133 

"133 

#: 132, 133 

#\ 133 

#L168 

%131 

& 131, 132 

&cas-er 138 

&cwriter 138 

&guard 168 

feimmutable 138 

feinline 138, 139, 140, 141, 

147, 148 
&name 140 
&no-leak 140 
&non-strict 141 
&non-strict 144, 145 
&not-inline 138, 140, 141, 

147, 148 
&not-inline-default 139 
&predicate 138 
&reader 138 
&side-effect-free 141 
&value 140, 141 
&writer 138 
'132 
*161 
+ 161 
-161 
/161 
//161 

137, 140, 141, 147, 148 

:141 

15, 134 
<160 
<=160 
<>160 
= 160 
>160 
>=160 

? 131, 132, 149, 168 
@ 132, 168 
[]148 
\ 133 
A 131 

_ 131, 140, 147 
Abstract Class 130 
Abstract Method 130 
Acquire 163 
Add-method 143 
And 160 

Short-Circuit 161 
Application 146 



Argument 14, 140 

Evaluation Order 145 

Passing Convention 140 
Array 152, 164 

Boolean 164 

Integer 164 
Ash 162 
Become 129 

Begin 17, 135, 144, 149, 170 
Block 131, 150 
BNF 130 
Body 144 
Boolean 152 
Boolean-Array 152, 164 
Busy? 163 
CAP 149 
Car 15 

CAS 147, 149 
CAS-er Method 138 
Cdrl5 
Cfuture 19, 144 

Semantics 144 
Char-ready? 167 
Character 133, 152 
Class 15, 133, 137, 152 

Abstract 130 

Assertion 146 

Built-in 137, 152 
Hierarchy 153 

Immutable 138 

Inheritance 137 

Inline 139 

Inquiry 156 

Metaclass 152 

Predicate 15, 138 
Class-kind? 157 
Class-of 156 
Clet 17, 18, 131, 147 

Multiple Value 148 
Co 158 

Collection 152 
Comments 15, 134 
Common Lisp 21, 129, 131, 

140, 168 
Compact-DCs 173 
Compact-Sends 173 
Compact-Vars 172 
Compile 171 
Compiler Option 169 
Complex Numbers 131 
Complex-Number 130 
Concurrency 18 
Concurrently 18, 149 
Conditional 149 



Cons 15 
Constant 135 

Expression 135 

Predefined 133 
Constituent 158 

Number 158 
Context Future 144 
Continuation 141, 150, 151 
Copy 156 
Cput 148 
Cset 17, 18, 147, 148 

Multiple Value 148 
Cwriter Method 138 
Declare 169, 171 
Deep-copy 156 
Deep-dispose 156 
Defclass 15, 131, 137 
Defconstant 131, 135 
Defglobal 131, 135, 170 
Define 131, 136 
Defmacro 21, 168 
Defmethod 131, 135, 143, 150 
Defparameter 135 
Defselector 131, 135, 142 
Defun 14, 131, 135, 142, 150 
Delete-Dead-Defs 172 
Delete-Locals 173 
Delete-Moves 172 
Delete-Touches 172 
Describe 171 

Detailed-Progress 171, 173 
Display 167 
Display-print 167 
Display-stream 167 
Dispose 156 
Distarray 20 
Distobj 20, 152, 158 
Distributed Object 20, 152, 
158 

Creation 158 
Distributed-Class 152 
End-of-file 133, 166 
Eql60 
Error 169 

Evaluation Order 145 
Exit 150 
Expression 144 

Constant 135 
False 133, 152 
Fast-Apply 173 
Fast-Contexts 172 
Ffibl4 
Fib 14, 121 
Fill 164 
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Index 



Float 152 

Fold-Constants 172 
Fold-Global-Constants 172 
For-each 165 
Force 19, 144, 146 
Formal 140 

Inline 140 

No-leak 140 

Not-inline 140 

Value 140 
Format 130 
Forward-Tails 172 
Frame-Migrate 172 
Frame-Regs 172 
Frame-Touches 172 
Funct 152 
Function 14, 15, 16, 140, 152 

Calling 141, 146 

Inline 141 

Non-strict 141 

Not-inline 141 

Predicate 15, 138 

Return Value 141, 150 

Side-Effect-Free 141 
Future 19, 144, 146, 150 

Caveats 146 

Context 144 

Eager 144 

Lazy 144 

Semantics 144 
Get 148, 165 
Get-group 159 
Global 133, 135 
Goals 129 

Group 131, 142, 158, 159 
Halt 169 
Identifier 131 

Undefined 133 
If 135, 149 

Immutable Class 138 
Include 169, 170 
Index 158 
Inheritance 15, 137 

Multiple 137 
Init 163, 164 
Inline 141, 172 

Class 139 

Formal 140 

Instance Variable 138 
Inline-Size-Cutoff 172 
Input 166 
Instance 

Object 15, 156 

Variable 15, 138 
Integer 152 
Integer-Array 152, 164 
Integer-Length 162 
J-Machine 129 



Join 167 

Lambda 131, 142, 150 
Large-Integer 152 
Lazy-Contexts 172 
Lazy-Future 146, 150 
Lazy-Ivar-Access 172 
Let 17, 18, 131, 147 

Multiple Value 148 
Lfibl7 

Lisp-Break 173 
Local Variable 17 
Lock 19, 152, 163 
Logand 162 
Logical-Limit 158 
Lognot 162 
Logor 162 
Logxor 162 
Loop 131, 149, 150 
LRU-Register-Allocation 172 
Macro 21, 131, 168 

Guard 168 

Optional 130 
Magnitude 152 
Map 164 
Max 160 
Merge-Code 172 
Metaclass 152 
Method 14, 15, 140, 143, 171 

Abstract 130 

Built-in 153 

Calling 141, 146 

CAS-er 138 

Cwriter 138 

Inline 141 

Non-strict 141 

Not-inline 141 

Overriding 16 

Reader 16, 138 

Return Value 141, 150 

Reverse 153 

Side-Effect-Free 141 

Writer 16, 138 
Method-Lambda 131, 142, 

150 
Min 160 
Mod 161 

Multiple Inheritance 137 
Multiple Value 141, 148, 150 
MV-clet 131, 148 
MV-cset 148 
MV-let 131, 148 
MV-set 148 
N-Nodes 172 
Name 131 

Undefined 133 
Name Space 131 
Nconcurrently 149 
Neq 160 



New 156, 158 
New-boolean-array 164 
New-integer-array 164 
New-queueing-lock 163 
New-simple-array 164 
New-simple-lock 163 
New-string 164 
Nfor-each 165 
Nil 133, 152 
No-leak Formal 140 
Non-strict 141, 144, 145 
Not 160 
Not-inline 141 

Formal 140 

Instance Variable 138 
Not-inline-default 139 
Nparallel 149, 150 
Null 133, 152 
Number 131, 133, 152 
Object 15, 16, 18, 133, 152 

Class 156 

Constituent 158 

Creation 156 

Distributed 20, 152, 158 
Creation 158 

Instance 15, 156 
Optimize-Built-Ins 172 
Optimize-Send-Self 173 
Option 169, 171 
Optional 130 
Or 160 

Short-Circuit 161 
Output 166 
Overriding 16 
Pair 15, 17 
Parallel 18, 149, 150 
Parameter 14, 135, 140 

Passing Convention 140 
Permanent-Definitions 173 
Physical-Limit 158 
Pragma 169 
Precise 172 

Predicate Function 15, 138 
Primitive 

Optional 130 
Primitive-Class 152 
Print 167 
Print-PC 173 
Program 135 
Progress 171, 173 
Put 148, 165 

Queueing-Lock 20, 152, 163 
Quote 133 
Read 167 
Read-char 167 
Read-line 167 
Read-stream 166 
Read-stream-char 166 



225 



Concurrent Smalltalk on the Message-Driven Processor 



Read-stream-line 166 
Reader Method 16, 138 
Real 152 
Receiver 14, 142 
Reg-Variables 172 
Release 163 
Repeat 150 
Reply 150, 151 
Resource 19 
Return 151 
Return Value 141, 150 

Declaration 141 

Multiple 141 
Return -value-expected? 151 
Reverse Method 153 
SC-And 161 
SC-Or 161 
Scheme 129, 140 
Scope 131, 132 

Static 142 
Selector 14 

Restricted 153, 154 
Redefining 153 
Self 14, 131, 142, 158 
Set 17, 147, 148, 170 

Multiple Value 148 
Shallow-copy 156 
Shallow-dispose 156 
Show 170 
Show-Asm 171 
Show-Hcode 171 
Show-MDP-Hcode 171 
Side-Effect-Free 141 
Simple-Array 20, 152 
Simple-Lock 19, 152, 163 
Size 165 

Small-Integer 152 
Smalltalk-80 129, 140 
Split 167 

Split-terminal 167 
Standard-Class 152 
Statement 144 

Application 146 

Optional 130 
Stream 152, 166 
Stream-char-ready? 166 
String 133, 152, 164 
Subclass 17 
Subclass? 157 
Subtype 17 
Superclass 17, 137 
Supertype 17 
Symbol 133, 152 
System-stream 152, 166 
Terminal-stream 167 
Tfibl8 
Token 131 
Top-Level Form 135 



Touch 19, 144, 145 
True 133, 152 
Type 15, 17 

Assertion 146 

Checking 17 

Declaration 18 
Undef 136 
Value 140, 144 

Formal 140 

Multiple 141, 148, 150 

Return 141, 150 
Declaration 141 
Variable 

Instance 15, 138 
Inline 138 
Not-inline 138 

Local 17 

Scope 142 
Vflow-Optimizations 172 
Warn-Free-References 173 
When 21 
While 149 
With-locks 163 
Write 167 
Write-char 167 
Write-stream 166 
Write-stream-char 166 
Write-stream-string 166 
Write-string 167 
Writer Method 16, 138 
Xor 161 
Zero? 130, 161 
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Million-transistor processors are being manufactured today, and soon it will 
be possible to put several million transistors on one integrated circuit. While 
memory applications of this technology are clear, it is not obvious how best to 
use it for computation purposes. One possibility is the architecture of the 
Message-Driven Processor (MDP), which consists of a 32+4-bit CPU, memory, 
and a network interface together on one chip. MDPs can be connected di- 
rectly to each other to form a 65536-processor, message-passing, MIMD, par- 
allel computer, the J-Machine. The MDFs architecture is unusual in that it 
provides a very high processing power to memory ratio. 
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Block 13 continued: 

Concurrent Smalltalk is the primary language used for programming the J- 
Machine. Concurrent Smalltalk is the the language of choice because it fits 
the J-Machine's fine-grain, message-passing model well. This thesis de- 
scribes Concurrent Smalltalk and its implementation on the J-Machine, in- 
cluding the Optimist II compiler and Cosmos operating system. Optimist II 
can perform global optimization of programs, including inline function expan- 
sion, type inference, and global evaluation of constant expressions. Next, 
Cosmos and the Concurrent Smalltalk runtime environment are described. 
Finally, some quantitative and qualitative results are presented. The grain 
size (the average amount of time a method executes before suspending) was 
found to be about 60 instructions, and the MDP was found to execute one in- 
struction every two or four cycles, depending on whether external DRAM is 
used. A number of qualitative issues are described, along with a few prelimi- 
nary results for addressing difficult problems such as controlling parallelism. 
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